Risk prediction model: Statistical and artificial neural network approach
Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim
2017-04-01
Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.
Model output statistics applied to wind power prediction
Energy Technology Data Exchange (ETDEWEB)
Joensen, A; Giebel, G; Landberg, L [Risoe National Lab., Roskilde (Denmark); Madsen, H; Nielsen, H A [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)
1999-03-01
Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.
Statistical models for expert judgement and wear prediction
International Nuclear Information System (INIS)
Pulkkinen, U.
1994-01-01
This thesis studies the statistical analysis of expert judgements and prediction of wear. The point of view adopted is the one of information theory and Bayesian statistics. A general Bayesian framework for analyzing both the expert judgements and wear prediction is presented. Information theoretic interpretations are given for some averaging techniques used in the determination of consensus distributions. Further, information theoretic models are compared with a Bayesian model. The general Bayesian framework is then applied in analyzing expert judgements based on ordinal comparisons. In this context, the value of information lost in the ordinal comparison process is analyzed by applying decision theoretic concepts. As a generalization of the Bayesian framework, stochastic filtering models for wear prediction are formulated. These models utilize the information from condition monitoring measurements in updating the residual life distribution of mechanical components. Finally, the application of stochastic control models in optimizing operational strategies for inspected components are studied. Monte-Carlo simulation methods, such as the Gibbs sampler and the stochastic quasi-gradient method, are applied in the determination of posterior distributions and in the solution of stochastic optimization problems. (orig.) (57 refs., 7 figs., 1 tab.)
Statistical model for prediction of hearing loss in patients receiving cisplatin chemotherapy.
Johnson, Andrew; Tarima, Sergey; Wong, Stuart; Friedland, David R; Runge, Christina L
2013-03-01
This statistical model might be used to predict cisplatin-induced hearing loss, particularly in patients undergoing concomitant radiotherapy. To create a statistical model based on pretreatment hearing thresholds to provide an individual probability for hearing loss from cisplatin therapy and, secondarily, to investigate the use of hearing classification schemes as predictive tools for hearing loss. Retrospective case-control study. Tertiary care medical center. A total of 112 subjects receiving chemotherapy and audiometric evaluation were evaluated for the study. Of these subjects, 31 met inclusion criteria for analysis. The primary outcome measurement was a statistical model providing the probability of hearing loss following the use of cisplatin chemotherapy. Fifteen of the 31 subjects had significant hearing loss following cisplatin chemotherapy. American Academy of Otolaryngology-Head and Neck Society and Gardner-Robertson hearing classification schemes revealed little change in hearing grades between pretreatment and posttreatment evaluations for subjects with or without hearing loss. The Chang hearing classification scheme could effectively be used as a predictive tool in determining hearing loss with a sensitivity of 73.33%. Pretreatment hearing thresholds were used to generate a statistical model, based on quadratic approximation, to predict hearing loss (C statistic = 0.842, cross-validated = 0.835). The validity of the model improved when only subjects who received concurrent head and neck irradiation were included in the analysis (C statistic = 0.91). A calculated cutoff of 0.45 for predicted probability has a cross-validated sensitivity and specificity of 80%. Pretreatment hearing thresholds can be used as a predictive tool for cisplatin-induced hearing loss, particularly with concomitant radiotherapy.
A neighborhood statistics model for predicting stream pathogen indicator levels.
Pandey, Pramod K; Pasternack, Gregory B; Majumder, Mahbubul; Soupir, Michelle L; Kaiser, Mark S
2015-03-01
Because elevated levels of water-borne Escherichia coli in streams are a leading cause of water quality impairments in the U.S., water-quality managers need tools for predicting aqueous E. coli levels. Presently, E. coli levels may be predicted using complex mechanistic models that have a high degree of unchecked uncertainty or simpler statistical models. To assess spatio-temporal patterns of instream E. coli levels, herein we measured E. coli, a pathogen indicator, at 16 sites (at four different times) within the Squaw Creek watershed, Iowa, and subsequently, the Markov Random Field model was exploited to develop a neighborhood statistics model for predicting instream E. coli levels. Two observed covariates, local water temperature (degrees Celsius) and mean cross-sectional depth (meters), were used as inputs to the model. Predictions of E. coli levels in the water column were compared with independent observational data collected from 16 in-stream locations. The results revealed that spatio-temporal averages of predicted and observed E. coli levels were extremely close. Approximately 66 % of individual predicted E. coli concentrations were within a factor of 2 of the observed values. In only one event, the difference between prediction and observation was beyond one order of magnitude. The mean of all predicted values at 16 locations was approximately 1 % higher than the mean of the observed values. The approach presented here will be useful while assessing instream contaminations such as pathogen/pathogen indicator levels at the watershed scale.
Prediction of lacking control power in power plants using statistical models
DEFF Research Database (Denmark)
Odgaard, Peter Fogh; Mataji, B.; Stoustrup, Jakob
2007-01-01
Prediction of the performance of plants like power plants is of interest, since the plant operator can use these predictions to optimize the plant production. In this paper the focus is addressed on a special case where a combination of high coal moisture content and a high load limits the possible...... plant load, meaning that the requested plant load cannot be met. The available models are in this case uncertain. Instead statistical methods are used to predict upper and lower uncertainty bounds on the prediction. Two different methods are used. The first relies on statistics of recent prediction...... errors; the second uses operating point depending statistics of prediction errors. Using these methods on the previous mentioned case, it can be concluded that the second method can be used to predict the power plant performance, while the first method has problems predicting the uncertain performance...
Estimating Predictive Variance for Statistical Gas Distribution Modelling
International Nuclear Information System (INIS)
Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo
2009-01-01
Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.
A statistical model for predicting muscle performance
Byerly, Diane Leslie De Caix
The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing
Output from Statistical Predictive Models as Input to eLearning Dashboards
Directory of Open Access Journals (Sweden)
Marlene A. Smith
2015-06-01
Full Text Available We describe how statistical predictive models might play an expanded role in educational analytics by giving students automated, real-time information about what their current performance means for eventual success in eLearning environments. We discuss how an online messaging system might tailor information to individual students using predictive analytics. The proposed system would be data-driven and quantitative; e.g., a message might furnish the probability that a student will successfully complete the certificate requirements of a massive open online course. Repeated messages would prod underperforming students and alert instructors to those in need of intervention. Administrators responsible for accreditation or outcomes assessment would have ready documentation of learning outcomes and actions taken to address unsatisfactory student performance. The article’s brief introduction to statistical predictive models sets the stage for a description of the messaging system. Resources and methods needed to develop and implement the system are discussed.
Wang, Ming; Long, Qi
2016-09-01
Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.
Statistical Model Predictions for p+p and Pb+Pb Collisions at LHC
Kraus, I; Oeschler, H; Redlich, K; Wheaton, S
2009-01-01
Particle production in p+p and central Pb+Pb collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in detail, and some of them, which are particularly appropriate to determine the chemical freeze-out point experimentally, are indicated. Considering elementary interactions on the other hand, we focus on strangeness production and its possible suppression. Extrapolating the thermal parameters to LHC energy, we present predictions of the statistical model for particle yields in p+p collisions. We quantify the strangeness suppression by the correlation volume parameter and discuss its influence on particle production. We propose observables that can provide deeper insight into the mechanism of strangeness production and suppression at LHC.
Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data
Directory of Open Access Journals (Sweden)
Alexander P. Kartun-Giles
2018-04-01
Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.
Statistical model predictions for p+p and Pb+Pb collisions at LHC
Kraus, I.; Cleymans, J.; Oeschler, H.; Redlich, K.; Wheaton, S.
2009-01-01
Particle production in p+p and central collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in
Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.
2013-10-01
Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.
Qi, D.; Majda, A.
2017-12-01
A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Larsen, Gunner Chr.; Hansen, Kurt Schaldemose
2004-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....
Predicting The Exit Time Of Employees In An Organization Using Statistical Model
Directory of Open Access Journals (Sweden)
Ahmed Al Kuwaiti
2015-08-01
Full Text Available Employees are considered as an asset to any organization and each organization provide a better and flexible working environment to retain its best and resourceful workforce. As such continuous efforts are being taken to avoid or extend the exitwithdrawal of employees from the organization. Human resource managers are facing a challenge to predict the exit time of employees and there is no precise model existing at present in the literature. This study has been conducted to predict the probability of exit of an employee in an organization using appropriate statistical model. Accordingly authors designed a model using Additive Weibull distribution to predict the expected exit time of employee in an organization. In addition a Shock model approach is also executed to check how well the Additive Weibull distribution suits in an organization. The analytical results showed that when the inter-arrival time increases the expected time for the employees to exit also increases. This study concluded that Additive Weibull distribution can be considered as an alternative in the place of Shock model approach to predict the exit time of employee in an organization.
Statistical prediction of Late Miocene climate
Digital Repository Service at National Institute of Oceanography (India)
Fernandes, A.A; Gupta, S.M.
by making certain simplifying assumptions; for example in modelling ocean 4 currents, the geostrophic approximation is made. In case of statistical prediction no such a priori assumption need be made. statistical prediction comprises of using observed data... the number of equations. In this case the equations are overdetermined, and therefore one has to look for a solution that best fits the sample data in a least squares sense. To this end we express the sample .... (2.1)+ ry = y + data as follows: n L c. (x...
Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James
2018-06-01
Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.
Predicting tube repair at French nuclear steam generators using statistical modeling
Energy Technology Data Exchange (ETDEWEB)
Mathon, C., E-mail: cedric.mathon@edf.fr [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Chaudhary, A. [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Gay, N.; Pitner, P. [EDF Generation, Nuclear Operation Division (UNIE), Saint-Denis (France)
2014-04-01
Electricité de France (EDF) currently operates a total of 58 Nuclear Pressurized Water Reactors (PWR) which are composed of 34 units of 900 MWe, 20 units of 1300 MWe and 4 units of 1450 MWe. This report provides an overall status of SG tube bundles on the 1300 MWe units. These units are 4 loop reactors using the AREVA 68/19 type SG model which are equipped either with Alloy 600 thermally treated (TT) tubes or Alloy 690 TT tubes. As of 2011, the effective full power years of operation (EFPY) ranges from 13 to 20 and during this time, the main degradation mechanisms observed on SG tubes are primary water stress corrosion cracking (PWSCC) and wear at anti-vibration bars (AVB) level. Statistical models have been developed for each type of degradation in order to predict the growth rate and number of affected tubes. Additional plugging is also performed to prevent other degradations such as tube wear due to foreign objects or high-cycle flow-induced fatigue. The contribution of these degradation mechanisms on the rate of tube plugging is described. The results from the statistical models are then used in predicting the long-term life of the steam generators and therefore providing a useful tool toward their effective life management and possible replacement.
Ratner, Bruce
2011-01-01
The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has
International Nuclear Information System (INIS)
Lee, Jae Bong; Park, Jae Hak; Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong
2005-01-01
The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained
Energy Technology Data Exchange (ETDEWEB)
Lee, Jae Bong; Park, Jae Hak [Chungbuk National Univ., Cheongju (Korea, Republic of); Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong [Korea Electtric Power Research Institute, Daejeon (Korea, Republic of)
2005-07-01
The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Manning, Robert M.
1990-01-01
A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.
Statistical Modelling of Wind Proles - Data Analysis and Modelling
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre
The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles.......The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles....
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Hansen, Kurt Schaldemose; Larsen, Gunner Chr.
2005-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...
Wind gust estimation by combining numerical weather prediction model and statistical post-processing
Patlakas, Platon; Drakaki, Eleni; Galanis, George; Spyrou, Christos; Kallos, George
2017-04-01
The continuous rise of off-shore and near-shore activities as well as the development of structures, such as wind farms and various offshore platforms, requires the employment of state-of-the-art risk assessment techniques. Such analysis is used to set the safety standards and can be characterized as a climatologically oriented approach. Nevertheless, a reliable operational support is also needed in order to minimize cost drawbacks and human danger during the construction and the functioning stage as well as during maintenance activities. One of the most important parameters for this kind of analysis is the wind speed intensity and variability. A critical measure associated with this variability is the presence and magnitude of wind gusts as estimated in the reference level of 10m. The latter can be attributed to different processes that vary among boundary-layer turbulence, convection activities, mountain waves and wake phenomena. The purpose of this work is the development of a wind gust forecasting methodology combining a Numerical Weather Prediction model and a dynamical statistical tool based on Kalman filtering. To this end, the parameterization of Wind Gust Estimate method was implemented to function within the framework of the atmospheric model SKIRON/Dust. The new modeling tool combines the atmospheric model with a statistical local adaptation methodology based on Kalman filters. This has been tested over the offshore west coastline of the United States. The main purpose is to provide a useful tool for wind analysis and prediction and applications related to offshore wind energy (power prediction, operation and maintenance). The results have been evaluated by using observational data from the NOAA's buoy network. As it was found, the predicted output shows a good behavior that is further improved after the local adjustment post-process.
Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu
2015-09-01
Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Levy, R.; Mcginness, H.
1976-01-01
Investigations were performed to predict the power available from the wind at the Goldstone, California, antenna site complex. The background for power prediction was derived from a statistical evaluation of available wind speed data records at this location and at nearby locations similarly situated within the Mojave desert. In addition to a model for power prediction over relatively long periods of time, an interim simulation model that produces sample wind speeds is described. The interim model furnishes uncorrelated sample speeds at hourly intervals that reproduce the statistical wind distribution at Goldstone. A stochastic simulation model to provide speed samples representative of both the statistical speed distributions and correlations is also discussed.
Walz, M. A.; Donat, M.; Leckebusch, G. C.
2017-12-01
As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.
A RANS knock model to predict the statistical occurrence of engine knock
International Nuclear Information System (INIS)
D'Adamo, Alessandro; Breda, Sebastiano; Fontanesi, Stefano; Irimescu, Adrian; Merola, Simona Silvia; Tornatore, Cinzia
2017-01-01
Highlights: • Development of a new RANS model for SI engine knock probability. • Turbulence-derived transport equations for variances of mixture fraction and enthalpy. • Gasoline autoignition delay times calculated from detailed chemical kinetics. • Knock probability validated against experiments on optically accessible GDI unit. • PDF-based knock model accounting for the random nature of SI engine knock in RANS simulations. - Abstract: In the recent past engine knock emerged as one of the main limiting aspects for the achievement of higher efficiency targets in modern spark-ignition (SI) engines. To attain these requirements, engine operating points must be moved as close as possible to the onset of abnormal combustions, although the turbulent nature of flow field and SI combustion leads to possibly ample fluctuations between consecutive engine cycles. This forces engine designers to distance the target condition from its theoretical optimum in order to prevent abnormal combustion, which can potentially damage engine components because of few individual heavy-knocking cycles. A statistically based RANS knock model is presented in this study, whose aim is the prediction not only of the ensemble average knock occurrence, poorly meaningful in such a stochastic event, but also of a knock probability. The model is based on look-up tables of autoignition times from detailed chemistry, coupled with transport equations for the variance of mixture fraction and enthalpy. The transported perturbations around the ensemble average value are based on variable gradients and on a local turbulent time scale. A multi-variate cell-based Gaussian-PDF model is proposed for the unburnt mixture, resulting in a statistical distribution for the in-cell reaction rate. An average knock precursor and its variance are independently calculated and transported; this results in the prediction of an earliest knock probability preceding the ensemble average knock onset, as confirmed by
Energy Technology Data Exchange (ETDEWEB)
Granderson, Jessica [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.; Price, Phillip N. [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.
2014-03-01
This paper documents the development and application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-building energy savings. The methodology complements the principles addressed in resources such as ASHRAE Guideline 14 and the International Performance Measurement and Verification Protocol. It requires fitting a baseline model to data from a ``training period’’ and using the model to predict total electricity consumption during a subsequent ``prediction period.’’ We illustrate the methodology by evaluating five baseline models using data from 29 buildings. The training period and prediction period were varied, and model predictions of daily, weekly, and monthly energy consumption were compared to meter data to determine model accuracy. Several metrics were used to characterize the accuracy of the predictions, and in some cases the best-performing model as judged by one metric was not the best performer when judged by another metric.
Pearce, Marcus T
2018-05-11
Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.
Bootstrap prediction and Bayesian prediction under misspecified models
Fushiki, Tadayoshi
2005-01-01
We consider a statistical prediction problem under misspecified models. In a sense, Bayesian prediction is an optimal prediction method when an assumed model is true. Bootstrap prediction is obtained by applying Breiman's `bagging' method to a plug-in prediction. Bootstrap prediction can be considered to be an approximation to the Bayesian prediction under the assumption that the model is true. However, in applications, there are frequently deviations from the assumed model. In this paper, bo...
Statistical and Machine Learning Models to Predict Programming Performance
Bergin, Susan
2006-01-01
This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant...
Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles
Mini, C.; Hogue, T. S.; Pincetl, S.
2012-04-01
Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
Pedretti, Daniele; Bianchi, Marco
2018-03-01
Breakthrough curves (BTCs) observed during tracer tests in highly heterogeneous aquifers display strong tailing. Power laws are popular models for both the empirical fitting of these curves, and the prediction of transport using upscaling models based on best-fitted estimated parameters (e.g. the power law slope or exponent). The predictive capacity of power law based upscaling models can be however questioned due to the difficulties to link model parameters with the aquifers' physical properties. This work analyzes two aspects that can limit the use of power laws as effective predictive tools: (a) the implication of statistical subsampling, which often renders power laws undistinguishable from other heavily tailed distributions, such as the logarithmic (LOG); (b) the difficulties to reconcile fitting parameters obtained from models with different formulations, such as the presence of a late-time cutoff in the power law model. Two rigorous and systematic stochastic analyses, one based on benchmark distributions and the other on BTCs obtained from transport simulations, are considered. It is found that a power law model without cutoff (PL) results in best-fitted exponents (αPL) falling in the range of typical experimental values reported in the literature (1.5 tailing becomes heavier. Strong fluctuations occur when the number of samples is limited, due to the effects of subsampling. On the other hand, when the power law model embeds a cutoff (PLCO), the best-fitted exponent (αCO) is insensitive to the degree of tailing and to the effects of subsampling and tends to a constant αCO ≈ 1. In the PLCO model, the cutoff rate (λ) is the parameter that fully reproduces the persistence of the tailing and is shown to be inversely correlated to the LOG scale parameter (i.e. with the skewness of the distribution). The theoretical results are consistent with the fitting analysis of a tracer test performed during the MADE-5 experiment. It is shown that a simple
Statistical Approaches for Spatiotemporal Prediction of Low Flows
Fangmann, A.; Haberlandt, U.
2017-12-01
An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be
Prediction of dimethyl disulfide levels from biosolids using statistical modeling.
Gabriel, Steven A; Vilalai, Sirapong; Arispe, Susanna; Kim, Hyunook; McConnell, Laura L; Torrents, Alba; Peot, Christopher; Ramirez, Mark
2005-01-01
Two statistical models were used to predict the concentration of dimethyl disulfide (DMDS) released from biosolids produced by an advanced wastewater treatment plant (WWTP) located in Washington, DC, USA. The plant concentrates sludge from primary sedimentation basins in gravity thickeners (GT) and sludge from secondary sedimentation basins in dissolved air flotation (DAF) thickeners. The thickened sludge is pumped into blending tanks and then fed into centrifuges for dewatering. The dewatered sludge is then conditioned with lime before trucking out from the plant. DMDS, along with other volatile sulfur and nitrogen-containing chemicals, is known to contribute to biosolids odors. These models identified oxidation/reduction potential (ORP) values of a GT and DAF, the amount of sludge dewatered by centrifuges, and the blend ratio between GT thickened sludge and DAF thickened sludge in blending tanks as control variables. The accuracy of the developed regression models was evaluated by checking the adjusted R2 of the regression as well as the signs of coefficients associated with each variable. In general, both models explained observed DMDS levels in sludge headspace samples. The adjusted R2 value of the regression models 1 and 2 were 0.79 and 0.77, respectively. Coefficients for each regression model also had the correct sign. Using the developed models, plant operators can adjust the controllable variables to proactively decrease this odorant. Therefore, these models are a useful tool in biosolids management at WWTPs.
Nonparametric predictive inference in statistical process control
Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.
2004-01-01
Statistical process control (SPC) is used to decide when to stop a process as confidence in the quality of the next item(s) is low. Information to specify a parametric model is not always available, and as SPC is of a predictive nature, we present a control chart developed using nonparametric
Manning, Robert M.
1986-01-01
A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.
Multivariate statistical models for disruption prediction at ASDEX Upgrade
International Nuclear Information System (INIS)
Aledda, R.; Cannas, B.; Fanni, A.; Sias, G.; Pautasso, G.
2013-01-01
In this paper, a disruption prediction system for ASDEX Upgrade has been proposed that does not require disruption terminated experiments to be implemented. The system consists of a data-based model, which is built using only few input signals coming from successfully terminated pulses. A fault detection and isolation approach has been used, where the prediction is based on the analysis of the residuals of an auto regressive exogenous input model. The prediction performance of the proposed system is encouraging when it is applied to the same set of campaigns used to implement the model. However, the false alarms significantly increase when we tested the system on discharges coming from experimental campaigns temporally far from those used to train the model. This is due to the well know aging effect inherent in the data-based models. The main advantage of the proposed method, with respect to other data-based approaches in literature, is that it does not need data on experiments terminated with a disruption, as it uses a normal operating conditions model. This is a big advantage in the prospective of a prediction system for ITER, where a limited number of disruptions can be allowed
Statistical short-term earthquake prediction.
Kagan, Y Y; Knopoff, L
1987-06-19
A statistical procedure, derived from a theoretical model of fracture growth, is used to identify a foreshock sequence while it is in progress. As a predictor, the procedure reduces the average uncertainty in the rate of occurrence for a future strong earthquake by a factor of more than 1000 when compared with the Poisson rate of occurrence. About one-third of all main shocks with local magnitude greater than or equal to 4.0 in central California can be predicted in this way, starting from a 7-year database that has a lower magnitude cut off of 1.5. The time scale of such predictions is of the order of a few hours to a few days for foreshocks in the magnitude range from 2.0 to 5.0.
Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in V2
Directory of Open Access Journals (Sweden)
Gutmann Michael
2005-02-01
Full Text Available Abstract Background It has been shown that the classical receptive fields of simple and complex cells in the primary visual cortex emerge from the statistical properties of natural images by forcing the cell responses to be maximally sparse or independent. We investigate how to learn features beyond the primary visual cortex from the statistical properties of modelled complex-cell outputs. In previous work, we showed that a new model, non-negative sparse coding, led to the emergence of features which code for contours of a given spatial frequency band. Results We applied ordinary independent component analysis to modelled outputs of complex cells that span different frequency bands. The analysis led to the emergence of features which pool spatially coherent across-frequency activity in the modelled primary visual cortex. Thus, the statistically optimal way of processing complex-cell outputs abandons separate frequency channels, while preserving and even enhancing orientation tuning and spatial localization. As a technical aside, we found that the non-negativity constraint is not necessary: ordinary independent component analysis produces essentially the same results as our previous work. Conclusion We propose that the pooling that emerges allows the features to code for realistic low-level image features related to step edges. Further, the results prove the viability of statistical modelling of natural images as a framework that produces quantitative predictions of visual processing.
Confidence scores for prediction models
DEFF Research Database (Denmark)
Gerds, Thomas Alexander; van de Wiel, MA
2011-01-01
In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation,...
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)
2012-03-15
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
International Nuclear Information System (INIS)
Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t
2012-01-01
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Kabeshova, Anastasiia; Launay, Cyrille P; Gromov, Vasilii A; Fantino, Bruno; Levinoff, Elise J; Allali, Gilles; Beauchet, Olivier
2016-01-01
To compare performance criteria (i.e., sensitivity, specificity, positive predictive value, negative predictive value, area under receiver operating characteristic curve and accuracy) of linear and non-linear statistical models for fall risk in older community-dwellers. Participants were recruited in two large population-based studies, "Prévention des Chutes, Réseau 4" (PCR4, n=1760, cross-sectional design, retrospective collection of falls) and "Prévention des Chutes Personnes Agées" (PCPA, n=1765, cohort design, prospective collection of falls). Six linear statistical models (i.e., logistic regression, discriminant analysis, Bayes network algorithm, decision tree, random forest, boosted trees), three non-linear statistical models corresponding to artificial neural networks (multilayer perceptron, genetic algorithm and neuroevolution of augmenting topologies [NEAT]) and the adaptive neuro fuzzy interference system (ANFIS) were used. Falls ≥1 characterizing fallers and falls ≥2 characterizing recurrent fallers were used as outcomes. Data of studies were analyzed separately and together. NEAT and ANFIS had better performance criteria compared to other models. The highest performance criteria were reported with NEAT when using PCR4 database and falls ≥1, and with both NEAT and ANFIS when pooling data together and using falls ≥2. However, sensitivity and specificity were unbalanced. Sensitivity was higher than specificity when identifying fallers, whereas the converse was found when predicting recurrent fallers. Our results showed that NEAT and ANFIS were non-linear statistical models with the best performance criteria for the prediction of falls but their sensitivity and specificity were unbalanced, underscoring that models should be used respectively for the screening of fallers and the diagnosis of recurrent fallers. Copyright © 2015 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.
Statistical model based gender prediction for targeted NGS clinical panels
Directory of Open Access Journals (Sweden)
Palani Kannan Kandavel
2017-12-01
The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.
Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index
Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla
2017-10-01
This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.
Spatial statistics for predicting flow through a rock fracture
International Nuclear Information System (INIS)
Coakley, K.J.
1989-03-01
Fluid flow through a single rock fracture depends on the shape of the space between the upper and lower pieces of rock which define the fracture. In this thesis, the normalized flow through a fracture, i.e. the equivalent permeability of a fracture, is predicted in terms of spatial statistics computed from the arrangement of voids, i.e. open spaces, and contact areas within the fracture. Patterns of voids and contact areas, with complexity typical of experimental data, are simulated by clipping a correlated Gaussian process defined on a N by N pixel square region. The voids have constant aperture; the distance between the upper and lower surfaces which define the fracture is either zero or a constant. Local flow is assumed to be proportional to local aperture cubed times local pressure gradient. The flow through a pattern of voids and contact areas is solved using a finite-difference method. After solving for the flow through simulated 10 by 10 by 30 pixel patterns of voids and contact areas, a model to predict equivalent permeability is developed. The first model is for patterns with 80% voids where all voids have the same aperture. The equivalent permeability of a pattern is predicted in terms of spatial statistics computed from the arrangement of voids and contact areas within the pattern. Four spatial statistics are examined. The change point statistic measures how often adjacent pixel alternate from void to contact area (or vice versa ) in the rows of the patterns which are parallel to the overall flow direction. 37 refs., 66 figs., 41 tabs
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise
2018-04-15
A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical Validation of Engineering and Scientific Models: Background
International Nuclear Information System (INIS)
Hills, Richard G.; Trucano, Timothy G.
1999-01-01
A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the uncertainty in predictions for linear and nonlinear models. Four example applications are presented; a linear model, a model for the behavior of a damped spring-mass system, a transient thermal conduction model, and a nonlinear transient convective-diffusive model based on Burger's equation. Correlated and uncorrelated model input parameters are considered. The model validation tutorial builds on the material presented in the propagation of uncertainty tutoriaI and uses the damp spring-mass system as the example application. The validation tutorial illustrates several concepts associated with the application of statistical inference to test model predictions against experimental observations. Several validation methods are presented including error band based, multivariate, sum of squares of residuals, and optimization methods. After completion of the tutorial, a survey of statistical model validation literature is presented and recommendations for future work are made
Monthly to seasonal low flow prediction: statistical versus dynamical models
Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke
2016-04-01
the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.
Wang, Tongfei; Kang, Xiaomin; He, Liying; Liu, Zhilan; Xu, Haijing; Zhao, Aimin
2017-09-01
To establish a statistical model to predict thrombophilia in patients with unexplained recurrent pregnancy loss (URPL). A retrospective case-control study was conducted at Ren Ji Hospital, Shanghai, China, from March 2014 to October 2016. The levels of D-dimer (DD), fibrinogen degradation products (FDP), activated partial thromboplastin time (APTT), prothrombin time (PT), thrombin time (TT), fibrinogen (Fg), and platelet aggregation in response to arachidonic acid (AA) and adenosine diphosphate (ADP) were collected. Receiver operating characteristic curve analysis was used to analyze data from 158 UPRL patients (≥3 previous first trimester pregnancy losses with unexplained etiology) and 131 non-RPL patients (no history of recurrent pregnancy loss). A logistic regression model (LRM) was built and the model was externally validated in another group of patients. The LRM included AA, DD, FDP, TT, APTT, and PT. The overall accuracy of the LRM was 80.9%, with sensitivity and specificity of 78.5% and 78.3%, respectively. The diagnostic threshold of the possibility of the LRM was 0.6492, with a sensitivity of 78.5% and a specificity of 78.3%. Subsequently, the LRM was validated with an overall accuracy of 83.6%. The LRM is a valuable model for prediction of thrombophilia in URPL patients. © 2017 International Federation of Gynecology and Obstetrics.
Simple statistical model for branched aggregates
DEFF Research Database (Denmark)
Lemarchand, Claire; Hansen, Jesper Schmidt
2015-01-01
, given that it already has bonds with others. The model is applied here to asphaltene nanoaggregates observed in molecular dynamics simulations of Cooee bitumen. The variation with temperature of the probabilities deduced from this model is discussed in terms of statistical mechanics arguments....... The relevance of the statistical model in the case of asphaltene nanoaggregates is checked by comparing the predicted value of the probability for one molecule to have exactly i bonds with the same probability directly measured in the molecular dynamics simulations. The agreement is satisfactory......We propose a statistical model that can reproduce the size distribution of any branched aggregate, including amylopectin, dendrimers, molecular clusters of monoalcohols, and asphaltene nanoaggregates. It is based on the conditional probability for one molecule to form a new bond with a molecule...
International Nuclear Information System (INIS)
Carvajal Escobar Yesid; Munoz, Flor Matilde
2007-01-01
The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.
Predicting statistical properties of open reading frames in bacterial genomes.
Directory of Open Access Journals (Sweden)
Katharina Mir
Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop
Morrison, Joseph H.
2010-01-01
A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.
Spatial Economics Model Predicting Transport Volume
Directory of Open Access Journals (Sweden)
Lu Bo
2016-10-01
Full Text Available It is extremely important to predict the logistics requirements in a scientific and rational way. However, in recent years, the improvement effect on the prediction method is not very significant and the traditional statistical prediction method has the defects of low precision and poor interpretation of the prediction model, which cannot only guarantee the generalization ability of the prediction model theoretically, but also cannot explain the models effectively. Therefore, in combination with the theories of the spatial economics, industrial economics, and neo-classical economics, taking city of Zhuanghe as the research object, the study identifies the leading industry that can produce a large number of cargoes, and further predicts the static logistics generation of the Zhuanghe and hinterlands. By integrating various factors that can affect the regional logistics requirements, this study established a logistics requirements potential model from the aspect of spatial economic principles, and expanded the way of logistics requirements prediction from the single statistical principles to an new area of special and regional economics.
Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.
Yu, Ying; Wang, Yirui; Gao, Shangce; Tang, Zheng
2017-01-01
With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.
Understanding and forecasting polar stratospheric variability with statistical models
Directory of Open Access Journals (Sweden)
C. Blume
2012-07-01
Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely the multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. The period from 2005 to 2011 can be hindcasted to a certain extent, where MLP performs significantly better than the remaining models. However, variability remains that cannot be statistically hindcasted within the current framework, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a winter 2011/12 with warm and weak vortex conditions. A vortex breakdown is predicted for late January, early February 2012.
Learning Predictive Statistics: Strategies and Brain Mechanisms.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-08-30
When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to
Müller, M. F.; Thompson, S. E.
2015-09-01
The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.
On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology.
Directory of Open Access Journals (Sweden)
Paul B Conn
Full Text Available Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook's notion of an independent variable hull (IVH, developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models.
A Stochastic Fractional Dynamics Model of Rainfall Statistics
Kundu, Prasun; Travis, James
2013-04-01
Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.
Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network
Directory of Open Access Journals (Sweden)
Ying Yu
2017-01-01
Full Text Available With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.
Statistical models to predict flows at monthly level in Salvajina
International Nuclear Information System (INIS)
Gonzalez, Harold O
1994-01-01
It thinks about and models of lineal regression evaluate at monthly level that they allow to predict flows in Salvajina, with base in predictions variable, like the difference of pressure between Darwin and Tahiti, precipitation in Piendamo Cauca), temperature in Port Chicama (Peru) and pressure in Tahiti
Predicting radiotherapy outcomes using statistical learning techniques
International Nuclear Information System (INIS)
El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O; Lindsay, Patricia E; Hope, Andrew J
2009-01-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model
Statistical prediction of biomethane potentials based on the composition of lignocellulosic biomass
DEFF Research Database (Denmark)
Thomsen, Sune Tjalfe; Spliid, Henrik; Østergård, Hanne
2014-01-01
Mixture models are introduced as a new and stronger methodology for statistical prediction of biomethane potentials (BPM) from lignocellulosic biomass compared to the linear regression models previously used. A large dataset from literature combined with our own data were analysed using canonical...
Directory of Open Access Journals (Sweden)
Gambo Anthony VICTOR
2017-06-01
Full Text Available A model to predict the dry sliding wear behaviour of Aluminium-Jute bast ash particulate composites produced by double stir-casting method was developed in terms of weight fraction of jute bast ash (JBA. Experiments were designed on the basis of the Design of Experiments (DOE technique. A 2k factorial, where k is the number of variables, with central composite second-order rotatable design was used to improve the reliability of results and to reduce the size of experimentation without loss of accuracy. The factors considered in this study were sliding velocity, sliding distance, normal load and mass fraction of JBA reinforcement in the matrix. The developed regression model was validated by statistical software MINITAB-R14 and statistical tool such as analysis of variance (ANOVA. It was found that the developed regression model could be effectively used to predict the wear rate at 95% confidence level. The wear rate of cast Al-JBAp composite decreased with an increase in the mass fraction of JBA and increased with an increase of the sliding velocity, sliding distance and normal load acting on the composite specimen.
Uncertainty propagation for statistical impact prediction of space debris
Hoogendoorn, R.; Mooij, E.; Geul, J.
2018-01-01
Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.
A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model
Energy Technology Data Exchange (ETDEWEB)
Pasqualini, Donatella [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-05-11
This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimated stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.
Eloqayli, Haytham; Al-Yousef, Ali; Jaradat, Raid
2018-02-15
Despite the high prevalence of chronic neck pain, there is limited consensus about the primary etiology, risk factors, diagnostic criteria and therapeutic outcome. Here, we aimed to determine if Ferritin and Vitamin D are modifiable risk factors with chronic neck pain using slandered statistics and artificial intelligence neural network (ANN). Fifty-four patients with chronic neck pain treated between February 2016 and August 2016 in King Abdullah University Hospital and 54 patients age matched controls undergoing outpatient or minor procedures were enrolled. Patients and control demographic parameters, height, weight and single measurement of serum vitamin D, Vitamin B12, ferritin, calcium, phosphorus, zinc were obtained. An ANN prediction model was developed. The statistical analysis reveals that patients with chronic neck pain have significantly lower serum Vitamin D and Ferritin (p-value artificial neural network can be of future benefit in classification and prediction models for chronic neck pain. We hope this initial work will encourage a future larger cohort study addressing vitamin D and iron correction as modifiable factors and the application of artificial intelligence models in clinical practice.
Predicting energy performance of a net-zero energy building: A statistical approach
International Nuclear Information System (INIS)
Kneifel, Joshua; Webb, David
2016-01-01
Highlights: • A regression model is applied to actual energy data from a net-zero energy building. • The model is validated through a rigorous statistical analysis. • Comparisons are made between model predictions and those of a physics-based model. • The model is a viable baseline for evaluating future models from the energy data. - Abstract: Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid Climate Zone, and compares these
Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop
Morrison, Joseph H.
2013-01-01
A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.
Predicting future protection of respirator users: Statistical approaches and practical implications.
Hu, Chengcheng; Harber, Philip; Su, Jing
2016-01-01
The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.
Modelling bankruptcy prediction models in Slovak companies
Directory of Open Access Journals (Sweden)
Kovacova Maria
2017-01-01
Full Text Available An intensive research from academics and practitioners has been provided regarding models for bankruptcy prediction and credit risk management. In spite of numerous researches focusing on forecasting bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression and early artificial intelligence models (e.g. artificial neural networks, there is a trend for transition to machine learning models (support vector machines, bagging, boosting, and random forest to predict bankruptcy one year prior to the event. Comparing the performance of this with unconventional approach with results obtained by discriminant analysis, logistic regression, and neural networks application, it has been found that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. On the other side the prediction accuracy of old and well known bankruptcy prediction models is quiet high. Therefore, we aim to analyse these in some way old models on the dataset of Slovak companies to validate their prediction ability in specific conditions. Furthermore, these models will be modelled according to new trends by calculating the influence of elimination of selected variables on the overall prediction ability of these models.
Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao
2018-04-01
In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.
Müller, M. F.; Thompson, S. E.
2016-02-01
The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.
Statistical modeling of geopressured geothermal reservoirs
Ansari, Esmail; Hughes, Richard; White, Christopher D.
2017-06-01
Identifying attractive candidate reservoirs for producing geothermal energy requires predictive models. In this work, inspectional analysis and statistical modeling are used to create simple predictive models for a line drive design. Inspectional analysis on the partial differential equations governing this design yields a minimum number of fifteen dimensionless groups required to describe the physics of the system. These dimensionless groups are explained and confirmed using models with similar dimensionless groups but different dimensional parameters. This study models dimensionless production temperature and thermal recovery factor as the responses of a numerical model. These responses are obtained by a Box-Behnken experimental design. An uncertainty plot is used to segment the dimensionless time and develop a model for each segment. The important dimensionless numbers for each segment of the dimensionless time are identified using the Boosting method. These selected numbers are used in the regression models. The developed models are reduced to have a minimum number of predictors and interactions. The reduced final models are then presented and assessed using testing runs. Finally, applications of these models are offered. The presented workflow is generic and can be used to translate the output of a numerical simulator into simple predictive models in other research areas involving numerical simulation.
Prediction of Frost Occurrences Using Statistical Modeling Approaches
Directory of Open Access Journals (Sweden)
Hyojin Lee
2016-01-01
Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.
Decoding β-decay systematics: A global statistical model for β- half-lives
International Nuclear Information System (INIS)
Costiris, N. J.; Mavrommatis, E.; Gernoth, K. A.; Clark, J. W.
2009-01-01
Statistical modeling of nuclear data provides a novel approach to nuclear systematics complementary to established theoretical and phenomenological approaches based on quantum theory. Continuing previous studies in which global statistical modeling is pursued within the general framework of machine learning theory, we implement advances in training algorithms designed to improve generalization, in application to the problem of reproducing and predicting the half-lives of nuclear ground states that decay 100% by the β - mode. More specifically, fully connected, multilayer feed-forward artificial neural network models are developed using the Levenberg-Marquardt optimization algorithm together with Bayesian regularization and cross-validation. The predictive performance of models emerging from extensive computer experiments is compared with that of traditional microscopic and phenomenological models as well as with the performance of other learning systems, including earlier neural network models as well as the support vector machines recently applied to the same problem. In discussing the results, emphasis is placed on predictions for nuclei that are far from the stability line, and especially those involved in r-process nucleosynthesis. It is found that the new statistical models can match or even surpass the predictive performance of conventional models for β-decay systematics and accordingly should provide a valuable additional tool for exploring the expanding nuclear landscape.
Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.
2018-01-01
The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.
International Nuclear Information System (INIS)
Nedic, Vladimir; Despotovic, Danijela; Cvetanovic, Slobodan; Despotovic, Milan; Babic, Sasa
2014-01-01
Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L eq . Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model
Energy Technology Data Exchange (ETDEWEB)
Nedic, Vladimir, E-mail: vnedic@kg.ac.rs [Faculty of Philology and Arts, University of Kragujevac, Jovana Cvijića bb, 34000 Kragujevac (Serbia); Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs [Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac (Serbia); Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs [Faculty of Economics, University of Niš, Trg kralja Aleksandra Ujedinitelja, 18000 Niš (Serbia); Despotovic, Milan, E-mail: mdespotovic@kg.ac.rs [Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac (Serbia); Babic, Sasa, E-mail: babicsf@yahoo.com [College of Applied Mechanical Engineering, Trstenik (Serbia)
2014-11-15
Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.
PREDICTIVE CAPACITY OF ARCH FAMILY MODELS
Directory of Open Access Journals (Sweden)
Raphael Silveira Amaro
2016-03-01
Full Text Available In the last decades, a remarkable number of models, variants from the Autoregressive Conditional Heteroscedastic family, have been developed and empirically tested, making extremely complex the process of choosing a particular model. This research aim to compare the predictive capacity, using the Model Confidence Set procedure, than five conditional heteroskedasticity models, considering eight different statistical probability distributions. The financial series which were used refers to the log-return series of the Bovespa index and the Dow Jones Industrial Index in the period between 27 October 2008 and 30 December 2014. The empirical evidences showed that, in general, competing models have a great homogeneity to make predictions, either for a stock market of a developed country or for a stock market of a developing country. An equivalent result can be inferred for the statistical probability distributions that were used.
Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.
2015-01-01
We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167
Can spatial statistical river temperature models be transferred between catchments?
Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.
2017-09-01
There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across
Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D
2017-01-01
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Learning predictive statistics from temporal sequences: Dynamics and strategies.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-10-01
Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.
Statistical approach to predict compressive strength of high workability slag-cement mortars
International Nuclear Information System (INIS)
Memon, N.A.; Memon, N.A.; Sumadi, S.R.
2009-01-01
This paper reports an attempt made to develop empirical expressions to estimate/ predict the compressive strength of high workability slag-cement mortars. Experimental data of 54 mix mortars were used. The mortars were prepared with slag as cement replacement of the order of 0, 50 and 60%. The flow (workability) was maintained at 136+-3%. The numerical and statistical analysis was performed by using database computer software Microsoft Office Excel 2003. Three empirical mathematical models were developed to estimate/predict 28 days compressive strength of high workability slag cement-mortars with 0, 50 and 60% slag which predict the values accurate between 97 and 98%. Finally a generalized empirical mathematical model was proposed which can predict 28 days compressive strength of high workability mortars up to degree of accuracy 95%. (author)
An improved mixing model providing joint statistics of scalar and scalar dissipation
Energy Technology Data Exchange (ETDEWEB)
Meyer, Daniel W. [Department of Energy Resources Engineering, Stanford University, Stanford, CA (United States); Jenny, Patrick [Institute of Fluid Dynamics, ETH Zurich (Switzerland)
2008-11-15
For the calculation of nonpremixed turbulent flames with thin reaction zones the joint probability density function (PDF) of the mixture fraction and its dissipation rate plays an important role. The corresponding PDF transport equation involves a mixing model for the closure of the molecular mixing term. Here, the parameterized scalar profile (PSP) mixing model is extended to provide the required joint statistics. Model predictions are validated using direct numerical simulation (DNS) data of a passive scalar mixing in a statistically homogeneous turbulent flow. Comparisons between the DNS and the model predictions are provided, which involve different initial scalar-field lengthscales. (author)
Bustamante, Javier; Seoane, Javier
2004-01-01
Aim To test the effectiveness of statistical models based on explanatory environmental variables vs. existing distribution information (maps and breeding atlas), for predicting the distribution of four species of raptors (family Accipitridae): common buzzard Buteo buteo (Linnaeus, 1758), short-toed eagle Circaetus gallicus (Gmelin, 1788), booted eagle Hieraaetus pennatus (Gmelin, 1788) and black kite Milvus migrans (Boddaert, 1783). Location Andalusia, southe...
A statistical model for aggregating judgments by incorporating peer predictions
McCoy, John; Prelec, Drazen
2017-01-01
We propose a probabilistic model to aggregate the answers of respondents answering multiple-choice questions. The model does not assume that everyone has access to the same information, and so does not assume that the consensus answer is correct. Instead, it infers the most probable world state, even if only a minority vote for it. Each respondent is modeled as receiving a signal contingent on the actual world state, and as using this signal to both determine their own answer and predict the ...
Statistical Models for Predicting Threat Detection From Human Behavior
Kelley, Timothy; Amon, Mary J.; Bertenthal, Bennett I.
2018-01-01
Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing
Statistical Models for Predicting Threat Detection From Human Behavior
Directory of Open Access Journals (Sweden)
Timothy Kelley
2018-04-01
Full Text Available Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption. Spoof websites had modified Uniform Resource Locator (URL and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level, survey-based (i.e., security knowledge and website familiarity, and real-time measures (i.e., mouse tracking in predicting risky online behavior
Statistical Models for Predicting Threat Detection From Human Behavior.
Kelley, Timothy; Amon, Mary J; Bertenthal, Bennett I
2018-01-01
Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure "non-spoof" or insecure "spoof" versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to "login" to or "back" out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing attacks
Predictive statistical modelling of cadmium content in durum wheat grain based on soil parameters.
Viala, Yoann; Laurette, Julien; Denaix, Laurence; Gourdain, Emmanuelle; Méléard, Benoit; Nguyen, Christophe; Schneider, André; Sappin-Didier, Valérie
2017-09-01
Regulatory limits on cadmium (Cd) content in food products are tending to become stricter, especially in cereals, which are a major contributor to dietary intake of Cd by humans. This is of particular importance for durum wheat, which accumulates more Cd than bread wheat. The contamination of durum wheat grain by Cd depends not only on the genotype but also to a large extent on soil Cd availability. Assessing the phytoavailability of Cd for durum wheat is thus crucial, and appropriate methods are required. For this purpose, we propose a statistical model to predict Cd accumulation in durum wheat grain based on soil geochemical properties related to Cd availability in French agricultural soils with low Cd contents and neutral to alkaline pH (soils commonly used to grow durum wheat). The best model is based on the concentration of total Cd in the soil solution, the pH of a soil CaCl 2 extract, the cation exchange capacity (CEC), and the content of manganese oxides (Tamm's extraction) in the soil. The model variables suggest a major influence of cadmium buffering power of the soil and of Cd speciation in solution. The model successfully explains 88% of Cd variability in grains with, generally, below 0.02 mg Cd kg -1 prediction error in wheat grain. Monte Carlo cross-validation indicated that model accuracy will suffice for the European Community project to reduce the regulatory limit from 0.2 to 0.15 mg Cd kg -1 grain, but not for the intermediate step at 0.175 mg Cd kg -1 . The model will help farmers assess the risk that the Cd content of their durum wheat grain will exceed regulatory limits, and help food safety authorities test different regulatory thresholds to find a trade-off between food safety and the negative impact a too strict regulation could have on farmers.
Multimesonic decays of charmonium states in the statistical quark model
International Nuclear Information System (INIS)
Montvay, I.; Toth, J.D.
1978-01-01
The data known at present of multimesonic decays of chi and psi states are fitted in a statistical quark model, in which the matrix elements are assumed to be constant and resonances as well as both strong and second order electromagnetic processes are taken into account. The experimental data are well reproduced by the model. Unknown branching ratios for the rest of multimesonic channels are predicted. The fit leaves about 40% for baryonic and radiative channels in the case of J/psi(3095). The fitted parameters of the J/psi decays are used to predict the mesonic decays of the pseudoscalar eta c. The statistical quark model seems to allow the calculation of competitive multiparticle processes for the studied decays. (D.P.)
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.
2016-04-01
The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.
Statistical Basis for Predicting Technological Progress
Nagy, Béla; Farmer, J. Doyne; Bui, Quan M.; Trancik, Jessika E.
2013-01-01
Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation. PMID:23468837
Statistical basis for predicting technological progress.
Directory of Open Access Journals (Sweden)
Béla Nagy
Full Text Available Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.
Mixed deterministic statistical modelling of regional ozone air pollution
Kalenderski, Stoitchko
2011-03-17
We develop a physically motivated statistical model for regional ozone air pollution by separating the ground-level pollutant concentration field into three components, namely: transport, local production and large-scale mean trend mostly dominated by emission rates. The model is novel in the field of environmental spatial statistics in that it is a combined deterministic-statistical model, which gives a new perspective to the modelling of air pollution. The model is presented in a Bayesian hierarchical formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate that the model vastly outperforms existing, simpler modelling approaches. Our study highlights the importance of simultaneously considering different aspects of an air pollution problem as well as taking into account the physical bases that govern the processes of interest. © 2011 John Wiley & Sons, Ltd..
Adaptive Maneuvering Frequency Method of Current Statistical Model
Institute of Scientific and Technical Information of China (English)
Wei Sun; Yongjian Yang
2017-01-01
Current statistical model(CSM) has a good performance in maneuvering target tracking. However, the fixed maneuvering frequency will deteriorate the tracking results, such as a serious dynamic delay, a slowly converging speedy and a limited precision when using Kalman filter(KF) algorithm. In this study, a new current statistical model and a new Kalman filter are proposed to improve the performance of maneuvering target tracking. The new model which employs innovation dominated subjection function to adaptively adjust maneuvering frequency has a better performance in step maneuvering target tracking, while a fluctuant phenomenon appears. As far as this problem is concerned, a new adaptive fading Kalman filter is proposed as well. In the new Kalman filter, the prediction values are amended in time by setting judgment and amendment rules,so that tracking precision and fluctuant phenomenon of the new current statistical model are improved. The results of simulation indicate the effectiveness of the new algorithm and the practical guiding significance.
van Klaveren, David; Steyerberg, Ewout W; Serruys, Patrick W; Kent, David M
2018-02-01
Clinical prediction models that support treatment decisions are usually evaluated for their ability to predict the risk of an outcome rather than treatment benefit-the difference between outcome risk with vs. without therapy. We aimed to define performance metrics for a model's ability to predict treatment benefit. We analyzed data of the Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery (SYNTAX) trial and of three recombinant tissue plasminogen activator trials. We assessed alternative prediction models with a conventional risk concordance-statistic (c-statistic) and a novel c-statistic for benefit. We defined observed treatment benefit by the outcomes in pairs of patients matched on predicted benefit but discordant for treatment assignment. The 'c-for-benefit' represents the probability that from two randomly chosen matched patient pairs with unequal observed benefit, the pair with greater observed benefit also has a higher predicted benefit. Compared to a model without treatment interactions, the SYNTAX score II had improved ability to discriminate treatment benefit (c-for-benefit 0.590 vs. 0.552), despite having similar risk discrimination (c-statistic 0.725 vs. 0.719). However, for the simplified stroke-thrombolytic predictive instrument (TPI) vs. the original stroke-TPI, the c-for-benefit (0.584 vs. 0.578) was similar. The proposed methodology has the potential to measure a model's ability to predict treatment benefit not captured with conventional performance metrics. Copyright © 2017 Elsevier Inc. All rights reserved.
Modeling pitting growth data and predicting degradation trend
International Nuclear Information System (INIS)
Viglasky, T.; Awad, R.; Zeng, Z.; Riznic, J.
2007-01-01
A non-statistical modeling approach to predict material degradation is presented in this paper. In this approach, the original data series is processed using Accumulated Generating Operation (AGO). With the aid of the AGO which weakens the random fluctuation embedded in the data series, an approximately exponential curve is established. The generated data series described by the exponential curve is then modeled by a differential equation. The coefficients of the differential equation can be deduced by approximate difference formula based on least-squares algorithm. By solving the differential equation and processing an inverse AGO, a predictive model can be obtained. As this approach is not established on the basis of statistics, the prediction can be performed with a limited amount of data. Implementation of this approach is demonstrated by predicting the pitting growth rate in specimens and wear trend in steam generator tubes. The analysis results indicate that this approach provides a powerful tool with reasonable precision to predict material degradation. (author)
Manning, Robert M.
1987-01-01
A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable
Statistical shape and appearance models of bones.
Sarkalkan, Nazli; Weinans, Harrie; Zadpoor, Amir A
2014-03-01
When applied to bones, statistical shape models (SSM) and statistical appearance models (SAM) respectively describe the mean shape and mean density distribution of bones within a certain population as well as the main modes of variations of shape and density distribution from their mean values. The availability of this quantitative information regarding the detailed anatomy of bones provides new opportunities for diagnosis, evaluation, and treatment of skeletal diseases. The potential of SSM and SAM has been recently recognized within the bone research community. For example, these models have been applied for studying the effects of bone shape on the etiology of osteoarthritis, improving the accuracy of clinical osteoporotic fracture prediction techniques, design of orthopedic implants, and surgery planning. This paper reviews the main concepts, methods, and applications of SSM and SAM as applied to bone. Copyright © 2013 Elsevier Inc. All rights reserved.
Predicting the lung compliance of mechanically ventilated patients via statistical modeling
International Nuclear Information System (INIS)
Ganzert, Steven; Kramer, Stefan; Guttmann, Josef
2012-01-01
To avoid ventilator associated lung injury (VALI) during mechanical ventilation, the ventilator is adjusted with reference to the volume distensibility or ‘compliance’ of the lung. For lung-protective ventilation, the lung should be inflated at its maximum compliance, i.e. when during inspiration a maximal intrapulmonary volume change is achieved by a minimal change of pressure. To accomplish this, one of the main parameters is the adjusted positive end-expiratory pressure (PEEP). As changing the ventilator settings usually produces an effect on patient's lung mechanics with a considerable time delay, the prediction of the compliance change associated with a planned change of PEEP could assist the physician at the bedside. This study introduces a machine learning approach to predict the nonlinear lung compliance for the individual patient by Gaussian processes, a probabilistic modeling technique. Experiments are based on time series data obtained from patients suffering from acute respiratory distress syndrome (ARDS). With a high hit ratio of up to 93%, the learned models could predict whether an increase/decrease of PEEP would lead to an increase/decrease of the compliance. However, the prediction of the complete pressure–volume relation for an individual patient has to be improved. We conclude that the approach is well suitable for the given problem domain but that an individualized feature selection should be applied for a precise prediction of individual pressure–volume curves. (paper)
Modelling diversity in building occupant behaviour: a novel statistical approach
DEFF Research Database (Denmark)
Haldi, Frédéric; Calì, Davide; Andersen, Rune Korsholm
2016-01-01
We propose an advanced modelling framework to predict the scope and effects of behavioural diversity regarding building occupant actions on window openings, shading devices and lighting. We develop a statistical approach based on generalised linear mixed models to account for the longitudinal nat...
Dėdelė, Audrius; Miškinytė, Auksė
2015-09-01
In many countries, road traffic is one of the main sources of air pollution associated with adverse effects on human health and environment. Nitrogen dioxide (NO2) is considered to be a measure of traffic-related air pollution, with concentrations tending to be higher near highways, along busy roads, and in the city centers, and the exceedances are mainly observed at measurement stations located close to traffic. In order to assess the air quality in the city and the air pollution impact on public health, air quality models are used. However, firstly, before the model can be used for these purposes, it is important to evaluate the accuracy of the dispersion modelling as one of the most widely used method. The monitoring and dispersion modelling are two components of air quality monitoring system (AQMS), in which statistical comparison was made in this research. The evaluation of the Atmospheric Dispersion Modelling System (ADMS-Urban) was made by comparing monthly modelled NO2 concentrations with the data of continuous air quality monitoring stations in Kaunas city. The statistical measures of model performance were calculated for annual and monthly concentrations of NO2 for each monitoring station site. The spatial analysis was made using geographic information systems (GIS). The calculation of statistical parameters indicated a good ADMS-Urban model performance for the prediction of NO2. The results of this study showed that the agreement of modelled values and observations was better for traffic monitoring stations compared to the background and residential stations.
A statistical approach to the prediction of pressure tube fracture toughness
International Nuclear Information System (INIS)
Pandey, M.D.; Radford, D.D.
2008-01-01
The fracture toughness of the zirconium alloy (Zr-2.5Nb) is an important parameter in determining the flaw tolerance for operation of pressure tubes in a nuclear reactor. Fracture toughness data have been generated by performing rising pressure burst tests on sections of pressure tubes removed from operating reactors. The test data were used to generate a lower-bound fracture toughness curve, which is used in defining the operational limits of pressure tubes. The paper presents a comprehensive statistical analysis of burst test data and develops a multivariate statistical model to relate toughness with material chemistry, mechanical properties, and operational history. The proposed model can be useful in predicting fracture toughness of specific in-service pressure tubes, thereby minimizing conservatism associated with a generic lower-bound approach
Statistical Modeling of Energy Production by Photovoltaic Farms
Czech Academy of Sciences Publication Activity Database
Brabec, Marek; Pelikán, Emil; Krč, Pavel; Eben, Kryštof; Musílek, P.
2011-01-01
Roč. 5, č. 9 (2011), s. 785-793 ISSN 1934-8975 Grant - others:GA AV ČR(CZ) M100300904 Institutional research plan: CEZ:AV0Z10300504 Keywords : electrical energy * solar energy * numerical weather prediction model * nonparametric regression * beta regression Subject RIV: BB - Applied Statistics, Operational Research
Models for predicting compressive strength and water absorption of ...
African Journals Online (AJOL)
This work presents a mathematical model for predicting the compressive strength and water absorption of laterite-quarry dust cement block using augmented Scheffe's simplex lattice design. The statistical models developed can predict the mix proportion that will yield the desired property. The models were tested for lack of ...
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Fatigue crack initiation and growth life prediction with statistical consideration
International Nuclear Information System (INIS)
Kwon, J.D.; Choi, S.H.; Kwak, S.G.; Chun, K.O.
1991-01-01
Life prediction or residual life prediction of structures or machines is one of the most strongly world wide needed problems as requirement in the stage of slowly developing economy which comes after rapidly and highly developing stage. For the purpose of statistical life prediction, fatigue test was conducted under the 3 stress levels, and for each stress level, 20 specimens are used. The statistical properties of the crack growth parameter m and C in the fatigue crack growth law of da/dN = C(ΔK) m , and the relationship between m and C, and the statistical distribution pattern of fatigue crack initiation, growth and fracture lives can be obtained by experimental results
Deep Flare Net (DeFN) Model for Solar Flare Prediction
Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Ishii, M.
2018-05-01
We developed a solar flare prediction model using a deep neural network (DNN) named Deep Flare Net (DeFN). This model can calculate the probability of flares occurring in the following 24 hr in each active region, which is used to determine the most likely maximum classes of flares via a binary classification (e.g., ≥M class versus statistically predict flares, the DeFN model was trained to optimize the skill score, i.e., the true skill statistic (TSS). As a result, we succeeded in predicting flares with TSS = 0.80 for ≥M-class flares and TSS = 0.63 for ≥C-class flares. Note that in usual DNN models, the prediction process is a black box. However, in the DeFN model, the features are manually selected, and it is possible to analyze which features are effective for prediction after evaluation.
DEFF Research Database (Denmark)
Rosthøj, Susanne; Keiding, Niels
2004-01-01
When studying a regression model measures of explained variation are used to assess the degree to which the covariates determine the outcome of interest. Measures of predictive accuracy are used to assess the accuracy of the predictions based on the covariates and the regression model. We give a ...... a detailed and general introduction to the two measures and the estimation procedures. The framework we set up allows for a study of the effect of misspecification on the quantities estimated. We also introduce a generalization to survival analysis....
Non-Gaussianity and statistical anisotropy from vector field populated inflationary models
Dimastrogiovanni, Emanuela; Matarrese, Sabino; Riotto, Antonio
2010-01-01
We present a review of vector field models of inflation and, in particular, of the statistical anisotropy and non-Gaussianity predictions of models with SU(2) vector multiplets. Non-Abelian gauge groups introduce a richer amount of predictions compared to the Abelian ones, mostly because of the presence of vector fields self-interactions. Primordial vector fields can violate isotropy leaving their imprint in the comoving curvature fluctuations zeta at late times. We provide the analytic expressions of the correlation functions of zeta up to fourth order and an analysis of their amplitudes and shapes. The statistical anisotropy signatures expected in these models are important and, potentially, the anisotropic contributions to the bispectrum and the trispectrum can overcome the isotropic parts.
International Nuclear Information System (INIS)
Pham, Binh T.; Hawkes, Grant L.; Einerson, Jeffrey J.
2014-01-01
As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test
Energy Technology Data Exchange (ETDEWEB)
Pham, Binh T., E-mail: Binh.Pham@inl.gov [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Hawkes, Grant L. [Thermal Science and Safety Analysis Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Einerson, Jeffrey J. [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States)
2014-05-01
As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test.
Energy Technology Data Exchange (ETDEWEB)
Tran, A; Yu, V; Nguyen, D; Woods, K; Low, D; Sheng, K [UCLA, Los Angeles, CA (United States)
2015-06-15
Purpose: Knowledge learned from previous plans can be used to guide future treatment planning. Existing knowledge-based treatment planning methods study the correlation between organ geometry and dose volume histogram (DVH), which is a lossy representation of the complete dose distribution. A statistical voxel dose learning (SVDL) model was developed that includes the complete dose volume information. Its accuracy of predicting volumetric-modulated arc therapy (VMAT) and non-coplanar 4π radiotherapy was quantified. SVDL provided more isotropic dose gradients and may improve knowledge-based planning. Methods: 12 prostate SBRT patients originally treated using two full-arc VMAT techniques were re-planned with 4π using 20 intensity-modulated non-coplanar fields to a prescription dose of 40 Gy. The bladder and rectum voxels were binned based on their distances to the PTV. The dose distribution in each bin was resampled by convolving to a Gaussian kernel, resulting in 1000 data points in each bin that predicted the statistical dose information of a voxel with unknown dose in a new patient without triaging information that may be collectively important to a particular patient. We used this method to predict the DVHs, mean and max doses in a leave-one-out cross validation (LOOCV) test and compared its performance against lossy estimators including mean, median, mode, Poisson and Rayleigh of the voxelized dose distributions. Results: SVDL predicted the bladder and rectum doses more accurately than other estimators, giving mean percentile errors ranging from 13.35–19.46%, 4.81–19.47%, 22.49–28.69%, 23.35–30.5%, 21.05–53.93% for predicting mean, max dose, V20, V35, and V40 respectively, to OARs in both planning techniques. The prediction errors were generally lower for 4π than VMAT. Conclusion: By employing all dose volume information in the SVDL model, the OAR doses were more accurately predicted. 4π plans are better suited for knowledge-based planning than
The prediction of epidemics through mathematical modeling.
Schaus, Catherine
2014-01-01
Mathematical models may be resorted to in an endeavor to predict the development of epidemics. The SIR model is one of the applications. Still too approximate, the use of statistics awaits more data in order to come closer to reality.
A statistical model for radar images of agricultural scenes
Frost, V. S.; Shanmugan, K. S.; Holtzman, J. C.; Stiles, J. A.
1982-01-01
The presently derived and validated statistical model for radar images containing many different homogeneous fields predicts the probability density functions of radar images of entire agricultural scenes, thereby allowing histograms of large scenes composed of a variety of crops to be described. Seasat-A SAR images of agricultural scenes are accurately predicted by the model on the basis of three assumptions: each field has the same SNR, all target classes cover approximately the same area, and the true reflectivity characterizing each individual target class is a uniformly distributed random variable. The model is expected to be useful in the design of data processing algorithms and for scene analysis using radar images.
Directory of Open Access Journals (Sweden)
Haydee Salmun
2015-02-01
Full Text Available The present study extends the applicability of a statistical model for prediction of storm surge originally developed for The Battery, NY in two ways: I. the statistical model is used as a biascorrection for operationally produced dynamical surge forecasts, and II. the statistical model is applied to the region of the east coast of the U.S. susceptible to winter extratropical storms. The statistical prediction is based on a regression relation between the “storm maximum” storm surge and the storm composite significant wave height predicted ata nearby location. The use of the statistical surge prediction as an alternative bias correction for the National Oceanic and Atmospheric Administration (NOAA operational storm surge forecasts is shownhere to be statistically equivalent to the existing bias correctiontechnique and potentially applicable for much longer forecast lead times as well as for storm surge climate prediction. Applying the statistical model to locations along the east coast shows that the regression relation can be “trained” with data from tide gauge measurements and near-shore buoys along the coast from North Carolina to Maine, and that it provides accurate estimates of storm surge.
Statistical Models to Assess the Health Effects and to Forecast Ground Level Ozone
Czech Academy of Sciences Publication Activity Database
Schlink, U.; Herbath, O.; Richter, M.; Dorling, S.; Nunnari, G.; Cawley, G.; Pelikán, Emil
2006-01-01
Roč. 21, č. 4 (2006), s. 547-558 ISSN 1364-8152 R&D Projects: GA AV ČR 1ET400300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : statistical models * ground level ozone * health effects * logistic model * forecasting * prediction performance * neural network * generalised additive model * integrated assessment Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.992, year: 2006
Comparison of statistical and clinical predictions of functional outcome after ischemic stroke.
Directory of Open Access Journals (Sweden)
Douglas D Thompson
Full Text Available To determine whether the predictions of functional outcome after ischemic stroke made at the bedside using a doctor's clinical experience were more or less accurate than the predictions made by clinical prediction models (CPMs.A prospective cohort study of nine hundred and thirty one ischemic stroke patients recruited consecutively at the outpatient, inpatient and emergency departments of the Western General Hospital, Edinburgh between 2002 and 2005. Doctors made informal predictions of six month functional outcome on the Oxford Handicap Scale (OHS. Patients were followed up at six months with a validated postal questionnaire. For each patient we calculated the absolute predicted risk of death or dependence (OHS≥3 using five previously described CPMs. The specificity of a doctor's informal predictions of OHS≥3 at six months was good 0.96 (95% CI: 0.94 to 0.97 and similar to CPMs (range 0.94 to 0.96; however the sensitivity of both informal clinical predictions 0.44 (95% CI: 0.39 to 0.49 and clinical prediction models (range 0.38 to 0.45 was poor. The prediction of the level of disability after stroke was similar for informal clinical predictions (ordinal c-statistic 0.74 with 95% CI 0.72 to 0.76 and CPMs (range 0.69 to 0.75. No patient or clinician characteristic affected the accuracy of informal predictions, though predictions were more accurate in outpatients.CPMs are at least as good as informal clinical predictions in discriminating between good and bad functional outcome after ischemic stroke. The place of these models in clinical practice has yet to be determined.
Pseudo-dynamic source modelling with 1-point and 2-point statistics of earthquake source parameters
Song, S. G.
2013-12-24
Ground motion prediction is an essential element in seismic hazard and risk analysis. Empirical ground motion prediction approaches have been widely used in the community, but efficient simulation-based ground motion prediction methods are needed to complement empirical approaches, especially in the regions with limited data constraints. Recently, dynamic rupture modelling has been successfully adopted in physics-based source and ground motion modelling, but it is still computationally demanding and many input parameters are not well constrained by observational data. Pseudo-dynamic source modelling keeps the form of kinematic modelling with its computational efficiency, but also tries to emulate the physics of source process. In this paper, we develop a statistical framework that governs the finite-fault rupture process with 1-point and 2-point statistics of source parameters in order to quantify the variability of finite source models for future scenario events. We test this method by extracting 1-point and 2-point statistics from dynamically derived source models and simulating a number of rupture scenarios, given target 1-point and 2-point statistics. We propose a new rupture model generator for stochastic source modelling with the covariance matrix constructed from target 2-point statistics, that is, auto- and cross-correlations. Our sensitivity analysis of near-source ground motions to 1-point and 2-point statistics of source parameters provides insights into relations between statistical rupture properties and ground motions. We observe that larger standard deviation and stronger correlation produce stronger peak ground motions in general. The proposed new source modelling approach will contribute to understanding the effect of earthquake source on near-source ground motion characteristics in a more quantitative and systematic way.
Incorporating uncertainty in predictive species distribution modelling.
Beale, Colin M; Lennon, Jack J
2012-01-19
Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.
Nonparametric predictive inference in statistical process control
Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.
2000-01-01
New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on
Risk predictive modelling for diabetes and cardiovascular disease.
Kengne, Andre Pascal; Masconi, Katya; Mbanya, Vivian Nchanchou; Lekoubou, Alain; Echouffo-Tcheugui, Justin Basile; Matsha, Tandi E
2014-02-01
Absolute risk models or clinical prediction models have been incorporated in guidelines, and are increasingly advocated as tools to assist risk stratification and guide prevention and treatments decisions relating to common health conditions such as cardiovascular disease (CVD) and diabetes mellitus. We have reviewed the historical development and principles of prediction research, including their statistical underpinning, as well as implications for routine practice, with a focus on predictive modelling for CVD and diabetes. Predictive modelling for CVD risk, which has developed over the last five decades, has been largely influenced by the Framingham Heart Study investigators, while it is only ∼20 years ago that similar efforts were started in the field of diabetes. Identification of predictive factors is an important preliminary step which provides the knowledge base on potential predictors to be tested for inclusion during the statistical derivation of the final model. The derived models must then be tested both on the development sample (internal validation) and on other populations in different settings (external validation). Updating procedures (e.g. recalibration) should be used to improve the performance of models that fail the tests of external validation. Ultimately, the effect of introducing validated models in routine practice on the process and outcomes of care as well as its cost-effectiveness should be tested in impact studies before wide dissemination of models beyond the research context. Several predictions models have been developed for CVD or diabetes, but very few have been externally validated or tested in impact studies, and their comparative performance has yet to be fully assessed. A shift of focus from developing new CVD or diabetes prediction models to validating the existing ones will improve their adoption in routine practice.
Statistical shear lag model - unraveling the size effect in hierarchical composites.
Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D
2015-05-01
Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Predicting Protein Secondary Structure with Markov Models
DEFF Research Database (Denmark)
Fischer, Paul; Larsen, Simon; Thomsen, Claus
2004-01-01
we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Directory of Open Access Journals (Sweden)
Daan Nieboer
Full Text Available External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting.We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1 the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2 the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury.The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples and heterogeneous in scenario 2 (in 17%-39% of simulated samples. Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2.The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Exploiting linkage disequilibrium in statistical modelling in quantitative genomics
DEFF Research Database (Denmark)
Wang, Lei
Alleles at two loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Genomic prediction and gene mapping rely on the existence of LD between gentic markers and causul variants of complex traits. In the first part of the thesis, a novel method...... to quantify and visualize local variation in LD along chromosomes in describet, and applied to characterize LD patters at the local and genome-wide scale in three Danish pig breeds. In the second part, different ways of taking LD into account in genomic prediction models are studied. One approach is to use...... the recently proposed antedependence models, which treat neighbouring marker effects as correlated; another approach involves use of haplotype block information derived using the program Beagle. The overall conclusion is that taking LD information into account in genomic prediction models potentially improves...
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Nearing, G. S.
2014-12-01
Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the
International Nuclear Information System (INIS)
Yahya, Noorazrul; Ebert, Martin A.; Bulsara, Max; House, Michael J.; Kennedy, Angel; Joseph, David J.; Denham, James W.
2016-01-01
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions
Energy Technology Data Exchange (ETDEWEB)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au [School of Physics, University of Western Australia, Western Australia 6009, Australia and School of Health Sciences, National University of Malaysia, Bangi 43600 (Malaysia); Ebert, Martin A. [School of Physics, University of Western Australia, Western Australia 6009, Australia and Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Bulsara, Max [Institute for Health Research, University of Notre Dame, Fremantle, Western Australia 6959 (Australia); House, Michael J. [School of Physics, University of Western Australia, Western Australia 6009 (Australia); Kennedy, Angel [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Joseph, David J. [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008, Australia and School of Surgery, University of Western Australia, Western Australia 6009 (Australia); Denham, James W. [School of Medicine and Public Health, University of Newcastle, New South Wales 2308 (Australia)
2016-05-15
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions
Prediction Model for Gastric Cancer Incidence in Korean Population.
Directory of Open Access Journals (Sweden)
Bang Wool Eom
Full Text Available Predicting high risk groups for gastric cancer and motivating these groups to receive regular checkups is required for the early detection of gastric cancer. The aim of this study is was to develop a prediction model for gastric cancer incidence based on a large population-based cohort in Korea.Based on the National Health Insurance Corporation data, we analyzed 10 major risk factors for gastric cancer. The Cox proportional hazards model was used to develop gender specific prediction models for gastric cancer development, and the performance of the developed model in terms of discrimination and calibration was also validated using an independent cohort. Discrimination ability was evaluated using Harrell's C-statistics, and the calibration was evaluated using a calibration plot and slope.During a median of 11.4 years of follow-up, 19,465 (1.4% and 5,579 (0.7% newly developed gastric cancer cases were observed among 1,372,424 men and 804,077 women, respectively. The prediction models included age, BMI, family history, meal regularity, salt preference, alcohol consumption, smoking and physical activity for men, and age, BMI, family history, salt preference, alcohol consumption, and smoking for women. This prediction model showed good accuracy and predictability in both the developing and validation cohorts (C-statistics: 0.764 for men, 0.706 for women.In this study, a prediction model for gastric cancer incidence was developed that displayed a good performance.
Physics-based statistical model and simulation method of RF propagation in urban environments
Pao, Hsueh-Yuan; Dvorak, Steven L.
2010-09-14
A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Hybrid perturbation methods based on statistical time series models
San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario
2016-04-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.
ARSENIC CONTAMINATION IN GROUNDWATER: A STATISTICAL MODELING
Palas Roy; Naba Kumar Mondal; Biswajit Das; Kousik Das
2013-01-01
High arsenic in natural groundwater in most of the tubewells of the Purbasthali- Block II area of Burdwan district (W.B, India) has recently been focused as a serious environmental concern. This paper is intending to illustrate the statistical modeling of the arsenic contaminated groundwater to identify the interrelation of that arsenic contain with other participating groundwater parameters so that the arsenic contamination level can easily be predicted by analyzing only such parameters. Mul...
Energy based prediction models for building acoustics
DEFF Research Database (Denmark)
Brunskog, Jonas
2012-01-01
In order to reach robust and simplified yet accurate prediction models, energy based principle are commonly used in many fields of acoustics, especially in building acoustics. This includes simple energy flow models, the framework of statistical energy analysis (SEA) as well as more elaborated...... principles as, e.g., wave intensity analysis (WIA). The European standards for building acoustic predictions, the EN 12354 series, are based on energy flow and SEA principles. In the present paper, different energy based prediction models are discussed and critically reviewed. Special attention is placed...... on underlying basic assumptions, such as diffuse fields, high modal overlap, resonant field being dominant, etc., and the consequences of these in terms of limitations in the theory and in the practical use of the models....
Farmer, William H.; Knight, Rodney R.; Eash, David A.; Kasey J. Hutchinson,; Linhart, S. Mike; Christiansen, Daniel E.; Archfield, Stacey A.; Over, Thomas M.; Kiang, Julie E.
2015-08-24
Daily records of streamflow are essential to understanding hydrologic systems and managing the interactions between human and natural systems. Many watersheds and locations lack streamgages to provide accurate and reliable records of daily streamflow. In such ungaged watersheds, statistical tools and rainfall-runoff models are used to estimate daily streamflow. Previous work compared 19 different techniques for predicting daily streamflow records in the southeastern United States. Here, five of the better-performing methods are compared in a different hydroclimatic region of the United States, in Iowa. The methods fall into three classes: (1) drainage-area ratio methods, (2) nonlinear spatial interpolations using flow duration curves, and (3) mechanistic rainfall-runoff models. The first two classes are each applied with nearest-neighbor and map-correlated index streamgages. Using a threefold validation and robust rank-based evaluation, the methods are assessed for overall goodness of fit of the hydrograph of daily streamflow, the ability to reproduce a daily, no-fail storage-yield curve, and the ability to reproduce key streamflow statistics. As in the Southeast study, a nonlinear spatial interpolation of daily streamflow using flow duration curves is found to be a method with the best predictive accuracy. Comparisons with previous work in Iowa show that the accuracy of mechanistic models with at-site calibration is substantially degraded in the ungaged framework.
Adding propensity scores to pure prediction models fails to improve predictive performance
Directory of Open Access Journals (Sweden)
Amy S. Nowacki
2013-08-01
Full Text Available Background. Propensity score usage seems to be growing in popularity leading researchers to question the possible role of propensity scores in prediction modeling, despite the lack of a theoretical rationale. It is suspected that such requests are due to the lack of differentiation regarding the goals of predictive modeling versus causal inference modeling. Therefore, the purpose of this study is to formally examine the effect of propensity scores on predictive performance. Our hypothesis is that a multivariable regression model that adjusts for all covariates will perform as well as or better than those models utilizing propensity scores with respect to model discrimination and calibration.Methods. The most commonly encountered statistical scenarios for medical prediction (logistic and proportional hazards regression were used to investigate this research question. Random cross-validation was performed 500 times to correct for optimism. The multivariable regression models adjusting for all covariates were compared with models that included adjustment for or weighting with the propensity scores. The methods were compared based on three predictive performance measures: (1 concordance indices; (2 Brier scores; and (3 calibration curves.Results. Multivariable models adjusting for all covariates had the highest average concordance index, the lowest average Brier score, and the best calibration. Propensity score adjustment and inverse probability weighting models without adjustment for all covariates performed worse than full models and failed to improve predictive performance with full covariate adjustment.Conclusion. Propensity score techniques did not improve prediction performance measures beyond multivariable adjustment. Propensity scores are not recommended if the analytical goal is pure prediction modeling.
Statistical surrogate models for prediction of high-consequence climate change.
Energy Technology Data Exchange (ETDEWEB)
Constantine, Paul; Field, Richard V., Jr.; Boslough, Mark Bruce Elrick
2011-09-01
In safety engineering, performance metrics are defined using probabilistic risk assessments focused on the low-probability, high-consequence tail of the distribution of possible events, as opposed to best estimates based on central tendencies. We frame the climate change problem and its associated risks in a similar manner. To properly explore the tails of the distribution requires extensive sampling, which is not possible with existing coupled atmospheric models due to the high computational cost of each simulation. We therefore propose the use of specialized statistical surrogate models (SSMs) for the purpose of exploring the probability law of various climate variables of interest. A SSM is different than a deterministic surrogate model in that it represents each climate variable of interest as a space/time random field. The SSM can be calibrated to available spatial and temporal data from existing climate databases, e.g., the Program for Climate Model Diagnosis and Intercomparison (PCMDI), or to a collection of outputs from a General Circulation Model (GCM), e.g., the Community Earth System Model (CESM) and its predecessors. Because of its reduced size and complexity, the realization of a large number of independent model outputs from a SSM becomes computationally straightforward, so that quantifying the risk associated with low-probability, high-consequence climate events becomes feasible. A Bayesian framework is developed to provide quantitative measures of confidence, via Bayesian credible intervals, in the use of the proposed approach to assess these risks.
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Wind speed prediction using statistical regression and neural network
Indian Academy of Sciences (India)
Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve ﬁtting,Auto Regressive ...
Sampling, Probability Models and Statistical Reasoning Statistical
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Artificial neural network models for prediction of intestinal permeability of oligopeptides
Directory of Open Access Journals (Sweden)
Kim Min-Kook
2007-07-01
Full Text Available Abstract Background Oral delivery is a highly desirable property for candidate drugs under development. Computational modeling could provide a quick and inexpensive way to assess the intestinal permeability of a molecule. Although there have been several studies aimed at predicting the intestinal absorption of chemical compounds, there have been no attempts to predict intestinal permeability on the basis of peptide sequence information. To develop models for predicting the intestinal permeability of peptides, we adopted an artificial neural network as a machine-learning algorithm. The positive control data consisted of intestinal barrier-permeable peptides obtained by the peroral phage display technique, and the negative control data were prepared from random sequences. Results The capacity of our models to make appropriate predictions was validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC curve (the ROC score. The training and test set statistics indicated that our models were of strikingly good quality and could discriminate between permeable and random sequences with a high level of confidence. Conclusion We developed artificial neural network models to predict the intestinal permeabilities of oligopeptides on the basis of peptide sequence information. Both binary and VHSE (principal components score Vectors of Hydrophobic, Steric and Electronic properties descriptors produced statistically significant training models; the models with simple neural network architectures showed slightly greater predictive power than those with complex ones. We anticipate that our models will be applicable to the selection of intestinal barrier-permeable peptides for generating peptide drugs or peptidomimetics.
Digital Repository Service at National Institute of Oceanography (India)
Srinivas, K.; Das, V.K.; DineshKumar, P.K.
This study investigates the suitability of statistical models for their predictive potential for the monthly mean sea level at different stations along the west and east coasts of the Indian subcontinent. Statistical modelling of the monthly mean...
Massive Predictive Modeling using Oracle R Enterprise
CERN. Geneva
2014-01-01
R is fast becoming the lingua franca for analyzing data via statistics, visualization, and predictive analytics. For enterprise-scale data, R users have three main concerns: scalability, performance, and production deployment. Oracle's R-based technologies - Oracle R Distribution, Oracle R Enterprise, Oracle R Connector for Hadoop, and the R package ROracle - address these concerns. In this talk, we introduce Oracle's R technologies, highlighting how each enables R users to achieve scalability and performance while making production deployment of R results a natural outcome of the data analyst/scientist efforts. The focus then turns to Oracle R Enterprise with code examples using the transparency layer and embedded R execution, targeting massive predictive modeling. One goal behind massive predictive modeling is to build models per entity, such as customers, zip codes, simulations, in an effort to understand behavior and tailor predictions at the entity level. Predictions...
A statistical skull geometry model for children 0-3 years old.
Directory of Open Access Journals (Sweden)
Zhigang Li
Full Text Available Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO. To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0-3 YO population. In this study, head CT scans from fifty-six 0-3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models.
A statistical skull geometry model for children 0-3 years old.
Li, Zhigang; Park, Byoung-Keon; Liu, Weiguo; Zhang, Jinhuan; Reed, Matthew P; Rupp, Jonathan D; Hoff, Carrie N; Hu, Jingwen
2015-01-01
Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO). To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0-3 YO population. In this study, head CT scans from fifty-six 0-3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.
Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita
2016-10-11
We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix
Energy Technology Data Exchange (ETDEWEB)
Tratnyek, P. G.; Bylaska, Eric J.; Weber, Eric J.
2017-01-01
Quantitative structure–activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with “in silico” results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs using descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for “in silico environmental chemical science” are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.
Tratnyek, Paul G; Bylaska, Eric J; Weber, Eric J
2017-03-22
Quantitative structure-activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with "in silico" results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs using descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for "in silico environmental chemical science" are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.
Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification
Directory of Open Access Journals (Sweden)
A. Sarri
2012-06-01
Full Text Available Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake. Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.
Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model
Boone, Spencer
2017-01-01
This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.
Statistical modelling of space-time processes with application to wind power
DEFF Research Database (Denmark)
Lenzi, Amanda
. This thesis aims at contributing to the wind power literature by building and evaluating new statistical techniques for producing forecasts at multiple locations and lead times using spatio-temporal information. By exploring the features of a rich portfolio of wind farms in western Denmark, we investigate...... propose spatial models for predicting wind power generation at two different time scales: for annual average wind power generation and for a high temporal resolution (typically wind power averages over 15-min time steps). In both cases, we use a spatial hierarchical statistical model in which spatial...
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
Connection between weighted LPC and higher-order statistics for AR model estimation
Kamp, Y.; Ma, C.
1993-01-01
This paper establishes the relationship between a weighted linear prediction method used for robust analysis of voiced speech and the autoregressive modelling based on higher-order statistics, known as cumulants
Kafle, Gopi Krishna; Chen, Lide
2016-02-01
There is a lack of literature reporting the methane potential of several livestock manures under the same anaerobic digestion conditions (same inoculum, temperature, time, and size of the digester). To the best of our knowledge, no previous study has reported biochemical methane potential (BMP) predicting models developed and evaluated by solely using at least five different livestock manure tests results. The goal of this study was to evaluate the BMP of five different livestock manures (dairy manure (DM), horse manure (HM), goat manure (GM), chicken manure (CM) and swine manure (SM)) and to predict the BMP using different statistical models. Nutrients of the digested different manures were also monitored. The BMP tests were conducted under mesophilic temperatures with a manure loading factor of 3.5g volatile solids (VS)/L and a feed to inoculum ratio (F/I) of 0.5. Single variable and multiple variable regression models were developed using manure total carbohydrate (TC), crude protein (CP), total fat (TF), lignin (LIG) and acid detergent fiber (ADF), and measured BMP data. Three different kinetic models (first order kinetic model, modified Gompertz model and Chen and Hashimoto model) were evaluated for BMP predictions. The BMPs of DM, HM, GM, CM and SM were measured to be 204, 155, 159, 259, and 323mL/g VS, respectively and the VS removals were calculated to be 58.6%, 52.9%, 46.4%, 81.4%, 81.4%, respectively. The technical digestion time (T80-90, time required to produce 80-90% of total biogas production) for DM, HM, GM, CM and SM was calculated to be in the ranges of 19-28, 27-37, 31-44, 13-18, 12-17days, respectively. The effluents from the HM showed the lowest nitrogen, phosphorus and potassium concentrations. The effluents from the CM digesters showed highest nitrogen and phosphorus concentrations and digested SM showed highest potassium concentration. Based on the results of the regression analysis, the model using the variable of LIG showed the best (R(2
Statistical modeling to support power system planning
Staid, Andrea
This dissertation focuses on data-analytic approaches that improve our understanding of power system applications to promote better decision-making. It tackles issues of risk analysis, uncertainty management, resource estimation, and the impacts of climate change. Tools of data mining and statistical modeling are used to bring new insight to a variety of complex problems facing today's power system. The overarching goal of this research is to improve the understanding of the power system risk environment for improved operation, investment, and planning decisions. The first chapter introduces some challenges faced in planning for a sustainable power system. Chapter 2 analyzes the driving factors behind the disparity in wind energy investments among states with a goal of determining the impact that state-level policies have on incentivizing wind energy. Findings show that policy differences do not explain the disparities; physical and geographical factors are more important. Chapter 3 extends conventional wind forecasting to a risk-based focus of predicting maximum wind speeds, which are dangerous for offshore operations. Statistical models are presented that issue probabilistic predictions for the highest wind speed expected in a three-hour interval. These models achieve a high degree of accuracy and their use can improve safety and reliability in practice. Chapter 4 examines the challenges of wind power estimation for onshore wind farms. Several methods for wind power resource assessment are compared, and the weaknesses of the Jensen model are demonstrated. For two onshore farms, statistical models outperform other methods, even when very little information is known about the wind farm. Lastly, chapter 5 focuses on the power system more broadly in the context of the risks expected from tropical cyclones in a changing climate. Risks to U.S. power system infrastructure are simulated under different scenarios of tropical cyclone behavior that may result from climate
Evaluation of burst pressure prediction models for line pipes
Energy Technology Data Exchange (ETDEWEB)
Zhu, Xian-Kui, E-mail: zhux@battelle.org [Battelle Memorial Institute, 505 King Avenue, Columbus, OH 43201 (United States); Leis, Brian N. [Battelle Memorial Institute, 505 King Avenue, Columbus, OH 43201 (United States)
2012-01-15
Accurate prediction of burst pressure plays a central role in engineering design and integrity assessment of oil and gas pipelines. Theoretical and empirical solutions for such prediction are evaluated in this paper relative to a burst pressure database comprising more than 100 tests covering a variety of pipeline steel grades and pipe sizes. Solutions considered include three based on plasticity theory for the end-capped, thin-walled, defect-free line pipe subjected to internal pressure in terms of the Tresca, von Mises, and ZL (or Zhu-Leis) criteria, one based on a cylindrical instability stress (CIS) concept, and a large group of analytical and empirical models previously evaluated by Law and Bowie (International Journal of Pressure Vessels and Piping, 84, 2007: 487-492). It is found that these models can be categorized into either a Tresca-family or a von Mises-family of solutions, except for those due to Margetson and Zhu-Leis models. The viability of predictions is measured via statistical analyses in terms of a mean error and its standard deviation. Consistent with an independent parallel evaluation using another large database, the Zhu-Leis solution is found best for predicting burst pressure, including consideration of strain hardening effects, while the Tresca strength solutions including Barlow, Maximum shear stress, Turner, and the ASME boiler code provide reasonably good predictions for the class of line-pipe steels with intermediate strain hardening response. - Highlights: Black-Right-Pointing-Pointer This paper evaluates different burst pressure prediction models for line pipes. Black-Right-Pointing-Pointer The existing models are categorized into two major groups of Tresca and von Mises solutions. Black-Right-Pointing-Pointer Prediction quality of each model is assessed statistically using a large full-scale burst test database. Black-Right-Pointing-Pointer The Zhu-Leis solution is identified as the best predictive model.
Evaluation of burst pressure prediction models for line pipes
International Nuclear Information System (INIS)
Zhu, Xian-Kui; Leis, Brian N.
2012-01-01
Accurate prediction of burst pressure plays a central role in engineering design and integrity assessment of oil and gas pipelines. Theoretical and empirical solutions for such prediction are evaluated in this paper relative to a burst pressure database comprising more than 100 tests covering a variety of pipeline steel grades and pipe sizes. Solutions considered include three based on plasticity theory for the end-capped, thin-walled, defect-free line pipe subjected to internal pressure in terms of the Tresca, von Mises, and ZL (or Zhu-Leis) criteria, one based on a cylindrical instability stress (CIS) concept, and a large group of analytical and empirical models previously evaluated by Law and Bowie (International Journal of Pressure Vessels and Piping, 84, 2007: 487–492). It is found that these models can be categorized into either a Tresca-family or a von Mises-family of solutions, except for those due to Margetson and Zhu-Leis models. The viability of predictions is measured via statistical analyses in terms of a mean error and its standard deviation. Consistent with an independent parallel evaluation using another large database, the Zhu-Leis solution is found best for predicting burst pressure, including consideration of strain hardening effects, while the Tresca strength solutions including Barlow, Maximum shear stress, Turner, and the ASME boiler code provide reasonably good predictions for the class of line-pipe steels with intermediate strain hardening response. - Highlights: ► This paper evaluates different burst pressure prediction models for line pipes. ► The existing models are categorized into two major groups of Tresca and von Mises solutions. ► Prediction quality of each model is assessed statistically using a large full-scale burst test database. ► The Zhu-Leis solution is identified as the best predictive model.
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
Seasonal predictability of Kiremt rainfall in coupled general circulation models
Gleixner, Stephanie; Keenlyside, Noel S.; Demissie, Teferi D.; Counillon, François; Wang, Yiguo; Viste, Ellen
2017-11-01
The Ethiopian economy and population is strongly dependent on rainfall. Operational seasonal predictions for the main rainy season (Kiremt, June-September) are based on statistical approaches with Pacific sea surface temperatures (SST) as the main predictor. Here we analyse dynamical predictions from 11 coupled general circulation models for the Kiremt seasons from 1985-2005 with the forecasts starting from the beginning of May. We find skillful predictions from three of the 11 models, but no model beats a simple linear prediction model based on the predicted Niño3.4 indices. The skill of the individual models for dynamically predicting Kiremt rainfall depends on the strength of the teleconnection between Kiremt rainfall and concurrent Pacific SST in the models. Models that do not simulate this teleconnection fail to capture the observed relationship between Kiremt rainfall and the large-scale Walker circulation.
Statistical models of petrol engines vehicles dynamics
Ilie, C. O.; Marinescu, M.; Alexa, O.; Vilău, R.; Grosu, D.
2017-10-01
This paper focuses on studying statistical models of vehicles dynamics. It was design and perform a one year testing program. There were used many same type cars with gasoline engines and different mileage. Experimental data were collected of onboard sensors and those on the engine test stand. A database containing data of 64th tests was created. Several mathematical modelling were developed using database and the system identification method. Each modelling is a SISO or a MISO linear predictive ARMAX (AutoRegressive-Moving-Average with eXogenous inputs) model. It represents a differential equation with constant coefficients. It were made 64th equations for each dependency like engine torque as output and engine’s load and intake manifold pressure, as inputs. There were obtained strings with 64 values for each type of model. The final models were obtained using average values of the coefficients. The accuracy of models was assessed.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Directory of Open Access Journals (Sweden)
Minh Vu Trieu
2017-03-01
Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
An explicit statistical model of learning lexical segmentation using multiple cues
Çöltekin, Ça ̆grı; Nerbonne, John; Lenci, Alessandro; Padró, Muntsa; Poibeau, Thierry; Villavicencio, Aline
2014-01-01
This paper presents an unsupervised and incremental model of learning segmentation that combines multiple cues whose use by children and adults were attested by experimental studies. The cues we exploit in this study are predictability statistics, phonotactics, lexical stress and partial lexical
Kanegae, Hiroshi; Oikawa, Takamitsu; Suzuki, Kenji; Okawara, Yukie; Kario, Kazuomi
2018-03-31
No integrated risk assessment tools that include lifestyle factors and uric acid have been developed. In accordance with the Industrial Safety and Health Law in Japan, a follow-up examination of 63 495 normotensive individuals (mean age 42.8 years) who underwent a health checkup in 2010 was conducted every year for 5 years. The primary endpoint was new-onset hypertension (systolic blood pressure [SBP]/diastolic blood pressure [DBP] ≥ 140/90 mm Hg and/or the initiation of antihypertensive medications with self-reported hypertension). During the mean 3.4 years of follow-up, 7402 participants (11.7%) developed hypertension. The prediction model included age, sex, body mass index (BMI), SBP, DBP, low-density lipoprotein cholesterol, uric acid, proteinuria, current smoking, alcohol intake, eating rate, DBP by age, and BMI by age at baseline and was created by using Cox proportional hazards models to calculate 3-year absolute risks. The derivation analysis confirmed that the model performed well both with respect to discrimination and calibration (n = 63 495; C-statistic = 0.885, 95% confidence interval [CI], 0.865-0.903; χ 2 statistic = 13.6, degree of freedom [df] = 7). In the external validation analysis, moreover, the model performed well both in its discrimination and calibration characteristics (n = 14 168; C-statistic = 0.846; 95%CI, 0.775-0.905; χ 2 statistic = 8.7, df = 7). Adding LDL cholesterol, uric acid, proteinuria, alcohol intake, eating rate, and BMI by age to the base model yielded a significantly higher C-statistic, net reclassification improvement (NRI), and integrated discrimination improvement, especially NRI non-event (NRI = 0.127, 95%CI = 0.100-0.152; NRI non-event = 0.108, 95%CI = 0.102-0.117). In conclusion, a highly precise model with good performance was developed for predicting incident hypertension using the new parameters of eating rate, uric acid, proteinuria, and BMI by age. ©2018 Wiley Periodicals, Inc.
Exclusion statistics and integrable models
International Nuclear Information System (INIS)
Mashkevich, S.
1998-01-01
The definition of exclusion statistics that was given by Haldane admits a 'statistical interaction' between distinguishable particles (multispecies statistics). For such statistics, thermodynamic quantities can be evaluated exactly; explicit expressions are presented here for cluster coefficients. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models of the Calogero-Sutherland type. The interesting questions of generalizing this correspondence to the higher-dimensional and the multispecies cases remain essentially open; however, our results provide some hints as to searches for the models in question
Prediction of Chemical Function: Model Development and Application
The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...
4K Video Traffic Prediction using Seasonal Autoregressive Modeling
Directory of Open Access Journals (Sweden)
D. R. Marković
2017-06-01
Full Text Available From the perspective of average viewer, high definition video streams such as HD (High Definition and UHD (Ultra HD are increasing their internet presence year over year. This is not surprising, having in mind expansion of HD streaming services, such as YouTube, Netflix etc. Therefore, high definition video streams are starting to challenge network resource allocation with their bandwidth requirements and statistical characteristics. Need for analysis and modeling of this demanding video traffic has essential importance for better quality of service and experience support. In this paper we use an easy-to-apply statistical model for prediction of 4K video traffic. Namely, seasonal autoregressive modeling is applied in prediction of 4K video traffic, encoded with HEVC (High Efficiency Video Coding. Analysis and modeling were performed within R programming environment using over 17.000 high definition video frames. It is shown that the proposed methodology provides good accuracy in high definition video traffic modeling.
2013-01-01
Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963
Modeling of asphalt-rubber rotational viscosity by statistical analysis and neural networks
Directory of Open Access Journals (Sweden)
Luciano Pivoto Specht
2007-03-01
Full Text Available It is of a great importance to know binders' viscosity in order to perform handling, mixing, application processes and asphalt mixes compaction in highway surfacing. This paper presents the results of viscosity measurement in asphalt-rubber binders prepared in laboratory. The binders were prepared varying the rubber content, rubber particle size, duration and temperature of mixture, all following a statistical design plan. The statistical analysis and artificial neural networks were used to create mathematical models for prediction of the binders viscosity. The comparison between experimental data and simulated results with the generated models showed best performance of the neural networks analysis in contrast to the statistic models. The results indicated that the rubber content and duration of mixture have major influence on the observed viscosity for the considered interval of parameters variation.
Preoperative prediction model of outcome after cholecystectomy for symptomatic gallstones
DEFF Research Database (Denmark)
Borly, L; Anderson, I B; Bardram, L
1999-01-01
and sonography evaluated gallbladder motility, gallstones, and gallbladder volume. Preoperative variables in patients with or without postcholecystectomy pain were compared statistically, and significant variables were combined in a logistic regression model to predict the postoperative outcome. RESULTS: Eighty...... and by the absence of 'agonizing' pain and of symptoms coinciding with pain (P model 15 of 18 predicted patients had postoperative pain (PVpos = 0.83). Of 62 patients predicted as having no pain postoperatively, 56 were pain-free (PVneg = 0.90). Overall accuracy...... was 89%. CONCLUSION: From this prospective study a model based on preoperative symptoms was developed to predict postcholecystectomy pain. Since intrastudy reclassification may give too optimistic results, the model should be validated in future studies....
Statistical modelling with quantile functions
Gilchrist, Warren
2000-01-01
Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...
A Statistical Programme Assignment Model
DEFF Research Database (Denmark)
Rosholm, Michael; Staghøj, Jonas; Svarer, Michael
When treatment effects of active labour market programmes are heterogeneous in an observable way across the population, the allocation of the unemployed into different programmes becomes a particularly important issue. In this paper, we present a statistical model designed to improve the present...... duration of unemployment spells may result if a statistical programme assignment model is introduced. We discuss several issues regarding the plementation of such a system, especially the interplay between the statistical model and case workers....
Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri
2014-01-01
Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.
Statistically Modeling I-V Characteristics of CNT-FET with LASSO
Ma, Dongsheng; Ye, Zuochang; Wang, Yan
2017-08-01
With the advent of internet of things (IOT), the need for studying new material and devices for various applications is increasing. Traditionally we build compact models for transistors on the basis of physics. But physical models are expensive and need a very long time to adjust for non-ideal effects. As the vision for the application of many novel devices is not certain or the manufacture process is not mature, deriving generalized accurate physical models for such devices is very strenuous, whereas statistical modeling is becoming a potential method because of its data oriented property and fast implementation. In this paper, one classical statistical regression method, LASSO, is used to model the I-V characteristics of CNT-FET and a pseudo-PMOS inverter simulation based on the trained model is implemented in Cadence. The normalized relative mean square prediction error of the trained model versus experiment sample data and the simulation results show that the model is acceptable for digital circuit static simulation. And such modeling methodology can extend to general devices.
Risk prediction model for colorectal cancer: National Health Insurance Corporation study, Korea.
Shin, Aesun; Joo, Jungnam; Yang, Hye-Ryung; Bak, Jeongin; Park, Yunjin; Kim, Jeongseon; Oh, Jae Hwan; Nam, Byung-Ho
2014-01-01
Incidence and mortality rates of colorectal cancer have been rapidly increasing in Korea during last few decades. Development of risk prediction models for colorectal cancer in Korean men and women is urgently needed to enhance its prevention and early detection. Gender specific five-year risk prediction models were developed for overall colorectal cancer, proximal colon cancer, distal colon cancer, colon cancer and rectal cancer. The model was developed using data from a population of 846,559 men and 479,449 women who participated in health examinations by the National Health Insurance Corporation. Examinees were 30-80 years old and free of cancer in the baseline years of 1996 and 1997. An independent population of 547,874 men and 415,875 women who participated in 1998 and 1999 examinations was used to validate the model. Model validation was done by evaluating its performance in terms of discrimination and calibration ability using the C-statistic and Hosmer-Lemeshow-type chi-square statistics. Age, body mass index, serum cholesterol, family history of cancer, and alcohol consumption were included in all models for men, whereas age, height, and meat intake frequency were included in all models for women. Models showed moderately good discrimination ability with C-statistics between 0.69 and 0.78. The C-statistics were generally higher in the models for men, whereas the calibration abilities were generally better in the models for women. Colorectal cancer risk prediction models were developed from large-scale, population-based data. Those models can be used for identifying high risk groups and developing preventive intervention strategies for colorectal cancer.
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
Hydrogen-bond coordination in organic crystal structures: statistics, predictions and applications.
Galek, Peter T A; Chisholm, James A; Pidcock, Elna; Wood, Peter A
2014-02-01
Statistical models to predict the number of hydrogen bonds that might be formed by any donor or acceptor atom in a crystal structure have been derived using organic structures in the Cambridge Structural Database. This hydrogen-bond coordination behaviour has been uniquely defined for more than 70 unique atom types, and has led to the development of a methodology to construct hypothetical hydrogen-bond arrangements. Comparing the constructed hydrogen-bond arrangements with known crystal structures shows promise in the assessment of structural stability, and some initial examples of industrially relevant polymorphs, co-crystals and hydrates are described.
Statistical prediction of nanoparticle delivery: from culture media to cell
Rowan Brown, M.; Hondow, Nicole; Brydson, Rik; Rees, Paul; Brown, Andrew P.; Summers, Huw D.
2015-04-01
The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for
Statistical prediction of nanoparticle delivery: from culture media to cell
International Nuclear Information System (INIS)
Brown, M Rowan; Rees, Paul; Summers, Huw D; Hondow, Nicole; Brydson, Rik; Brown, Andrew P
2015-01-01
The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for
A new method to determine the number of experimental data using statistical modeling methods
Energy Technology Data Exchange (ETDEWEB)
Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)
2017-06-15
For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.
Development of a prognostic model for predicting spontaneous singleton preterm birth.
Schaaf, Jelle M; Ravelli, Anita C J; Mol, Ben Willem J; Abu-Hanna, Ameen
2012-10-01
To develop and validate a prognostic model for prediction of spontaneous preterm birth. Prospective cohort study using data of the nationwide perinatal registry in The Netherlands. We studied 1,524,058 singleton pregnancies between 1999 and 2007. We developed a multiple logistic regression model to estimate the risk of spontaneous preterm birth based on maternal and pregnancy characteristics. We used bootstrapping techniques to internally validate our model. Discrimination (AUC), accuracy (Brier score) and calibration (calibration graphs and Hosmer-Lemeshow C-statistic) were used to assess the model's predictive performance. Our primary outcome measure was spontaneous preterm birth at model included 13 variables for predicting preterm birth. The predicted probabilities ranged from 0.01 to 0.71 (IQR 0.02-0.04). The model had an area under the receiver operator characteristic curve (AUC) of 0.63 (95% CI 0.63-0.63), the Brier score was 0.04 (95% CI 0.04-0.04) and the Hosmer Lemeshow C-statistic was significant (pvalues of predicted probability. The positive predictive value was 26% (95% CI 20-33%) for the 0.4 probability cut-off point. The model's discrimination was fair and it had modest calibration. Previous preterm birth, drug abuse and vaginal bleeding in the first half of pregnancy were the most important predictors for spontaneous preterm birth. Although not applicable in clinical practice yet, this model is a next step towards early prediction of spontaneous preterm birth that enables caregivers to start preventive therapy in women at higher risk. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Nonparametric Bayesian predictive distributions for future order statistics
Richard A. Johnson; James W. Evans; David W. Green
1999-01-01
We derive the predictive distribution for a specified order statistic, determined from a future random sample, under a Dirichlet process prior. Two variants of the approach are treated and some limiting cases studied. A practical application to monitoring the strength of lumber is discussed including choices of prior expectation and comparisons made to a Bayesian...
Predictive modelling of noise level generated during sawing of rocks
Indian Academy of Sciences (India)
This paper presents an experimental and statistical study on noise level generated during of rock sawing by circular diamond sawblades. Inﬂuence of the operating variables and rock properties on the noise level are investigated and analysed. Statistical analyses are then employed and models are built for the prediction of ...
International Nuclear Information System (INIS)
Sun, Yanan; Dong, Jizhe; Ding, Lijuan
2017-01-01
Highlights: • A day–ahead wind–thermal unit commitment model is presented. • Wind speed transfer matrix is formed to depict the sequential wind features. • Spinning reserve setting considering wind power accuracy and variation is proposed. • Verified study is performed to check the correctness of the program. - Abstract: The increasing penetration of intermittent wind power affects the secure operation of power systems and leads to a requirement of robust and economic generation scheduling. This paper presents an optimal day–ahead wind–thermal generation scheduling method that considers the statistical and predicted features of wind speeds. In this method, the statistical analysis of historical wind data, which represents the local wind regime, is first implemented. Then, according to the statistical results and the predicted wind power, the spinning reserve requirements for the scheduling period are calculated. Based on the calculated spinning reserve requirements, the wind–thermal generation scheduling is finally conducted. To validate the program, a verified study is performed on a test system. Then, numerical studies to demonstrate the effectiveness of the proposed method are conducted.
Teubner, Wera; Mehling, Anette; Schuster, Paul Xaver; Guth, Katharina; Worth, Andrew; Burton, Julien; van Ravenzwaay, Bennard; Landsiedel, Robert
2013-12-01
National legislations for the assessment of the skin sensitization potential of chemicals are increasingly based on the globally harmonized system (GHS). In this study, experimental data on 55 non-sensitizing and 45 sensitizing chemicals were evaluated according to GHS criteria and used to test the performance of computer (in silico) models for the prediction of skin sensitization. Statistic models (Vega, Case Ultra, TOPKAT), mechanistic models (Toxtree, OECD (Q)SAR toolbox, DEREK) or a hybrid model (TIMES-SS) were evaluated. Between three and nine of the substances evaluated were found in the individual training sets of various models. Mechanism based models performed better than statistical models and gave better predictivities depending on the stringency of the domain definition. Best performance was achieved by TIMES-SS, with a perfect prediction, whereby only 16% of the substances were within its reliability domain. Some models offer modules for potency; however predictions did not correlate well with the GHS sensitization subcategory derived from the experimental data. In conclusion, although mechanistic models can be used to a certain degree under well-defined conditions, at the present, the in silico models are not sufficiently accurate for broad application to predict skin sensitization potentials. Copyright © 2013 Elsevier Inc. All rights reserved.
Shape-correlated deformation statistics for respiratory motion prediction in 4D lung
Liu, Xiaoxiao; Oguz, Ipek; Pizer, Stephen M.; Mageras, Gig S.
2010-02-01
4D image-guided radiation therapy (IGRT) for free-breathing lungs is challenging due to the complicated respiratory dynamics. Effective modeling of respiratory motion is crucial to account for the motion affects on the dose to tumors. We propose a shape-correlated statistical model on dense image deformations for patient-specic respiratory motion estimation in 4D lung IGRT. Using the shape deformations of the high-contrast lungs as the surrogate, the statistical model trained from the planning CTs can be used to predict the image deformation during delivery verication time, with the assumption that the respiratory motion at both times are similar for the same patient. Dense image deformation fields obtained by diffeomorphic image registrations characterize the respiratory motion within one breathing cycle. A point-based particle optimization algorithm is used to obtain the shape models of lungs with group-wise surface correspondences. Canonical correlation analysis (CCA) is adopted in training to maximize the linear correlation between the shape variations of the lungs and the corresponding dense image deformations. Both intra- and inter-session CT studies are carried out on a small group of lung cancer patients and evaluated in terms of the tumor location accuracies. The results suggest potential applications using the proposed method.
Predictive models for PEM-electrolyzer performance using adaptive neuro-fuzzy inference systems
Energy Technology Data Exchange (ETDEWEB)
Becker, Steffen [University of Tasmania, Hobart 7001, Tasmania (Australia); Karri, Vishy [Australian College of Kuwait (Kuwait)
2010-09-15
Predictive models were built using neural network based Adaptive Neuro-Fuzzy Inference Systems for hydrogen flow rate, electrolyzer system-efficiency and stack-efficiency respectively. A comprehensive experimental database forms the foundation for the predictive models. It is argued that, due to the high costs associated with the hydrogen measuring equipment; these reliable predictive models can be implemented as virtual sensors. These models can also be used on-line for monitoring and safety of hydrogen equipment. The quantitative accuracy of the predictive models is appraised using statistical techniques. These mathematical models are found to be reliable predictive tools with an excellent accuracy of {+-}3% compared with experimental values. The predictive nature of these models did not show any significant bias to either over prediction or under prediction. These predictive models, built on a sound mathematical and quantitative basis, can be seen as a step towards establishing hydrogen performance prediction models as generic virtual sensors for wider safety and monitoring applications. (author)
Moment based model predictive control for systems with additive uncertainty
Saltik, M.B.; Ozkan, L.; Weiland, S.; Ludlage, J.H.A.
2017-01-01
In this paper, we present a model predictive control (MPC) strategy based on the moments of the state variables and the cost functional. The statistical properties of the state predictions are calculated through the open loop iteration of dynamics and used in the formulation of MPC cost function. We
Directory of Open Access Journals (Sweden)
M.M. Mohie El-Din
2011-10-01
Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.
A Statistical Evaluation of Atmosphere-Ocean General Circulation Models: Complexity vs. Simplicity
Robert K. Kaufmann; David I. Stern
2004-01-01
The principal tools used to model future climate change are General Circulation Models which are deterministic high resolution bottom-up models of the global atmosphere-ocean system that require large amounts of supercomputer time to generate results. But are these models a cost-effective way of predicting future climate change at the global level? In this paper we use modern econometric techniques to evaluate the statistical adequacy of three general circulation models (GCMs) by testing thre...
Predicting Statistical Distributions of Footbridge Vibrations
DEFF Research Database (Denmark)
Pedersen, Lars; Frier, Christian
2009-01-01
The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.
Kaur, Harpreet; Raghava, G P S
2002-03-01
beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
Predictive modeling of coral disease distribution within a reef system.
Directory of Open Access Journals (Sweden)
Gareth J Williams
2010-02-01
Full Text Available Diseases often display complex and distinct associations with their environment due to differences in etiology, modes of transmission between hosts, and the shifting balance between pathogen virulence and host resistance. Statistical modeling has been underutilized in coral disease research to explore the spatial patterns that result from this triad of interactions. We tested the hypotheses that: 1 coral diseases show distinct associations with multiple environmental factors, 2 incorporating interactions (synergistic collinearities among environmental variables is important when predicting coral disease spatial patterns, and 3 modeling overall coral disease prevalence (the prevalence of multiple diseases as a single proportion value will increase predictive error relative to modeling the same diseases independently. Four coral diseases: Porites growth anomalies (PorGA, Porites tissue loss (PorTL, Porites trematodiasis (PorTrem, and Montipora white syndrome (MWS, and their interactions with 17 predictor variables were modeled using boosted regression trees (BRT within a reef system in Hawaii. Each disease showed distinct associations with the predictors. Environmental predictors showing the strongest overall associations with the coral diseases were both biotic and abiotic. PorGA was optimally predicted by a negative association with turbidity, PorTL and MWS by declines in butterflyfish and juvenile parrotfish abundance respectively, and PorTrem by a modal relationship with Porites host cover. Incorporating interactions among predictor variables contributed to the predictive power of our models, particularly for PorTrem. Combining diseases (using overall disease prevalence as the model response, led to an average six-fold increase in cross-validation predictive deviance over modeling the diseases individually. We therefore recommend coral diseases to be modeled separately, unless known to have etiologies that respond in a similar manner to
Chakraborty, Prithviraj; Parcha, Versha; Chakraborty, Debarupa D; Ghosh, Amitava
2016-01-01
Buccoadhesive wafer dosage form containing Loratadine is formulated utilizing Formulation by Design (FbD) approach incorporating sodium alginate and lactose monohydrate as independent variable employing solvent casting method. The wafers were statistically optimized using Response Surface Methodology (RSM) and Artificial Neural Network algorithm (ANN) for predicting physicochemical and physico-mechanical properties of the wafers as responses. Morphologically wafers were tested using SEM. Quick disintegration of the samples was examined employing Optical Contact Angle (OCA). The comparison of the predictability of RSM and ANN showed a high prognostic capacity of RSM model over ANN model in forecasting mechanical and physicochemical properties of the wafers. The in vivo assessment of the optimized buccoadhesive wafer exhibits marked increase in bioavailability justifying the administration of Loratadine through buccal route, bypassing hepatic first pass metabolism.
Risk prediction model for colorectal cancer: National Health Insurance Corporation study, Korea.
Directory of Open Access Journals (Sweden)
Aesun Shin
Full Text Available PURPOSE: Incidence and mortality rates of colorectal cancer have been rapidly increasing in Korea during last few decades. Development of risk prediction models for colorectal cancer in Korean men and women is urgently needed to enhance its prevention and early detection. METHODS: Gender specific five-year risk prediction models were developed for overall colorectal cancer, proximal colon cancer, distal colon cancer, colon cancer and rectal cancer. The model was developed using data from a population of 846,559 men and 479,449 women who participated in health examinations by the National Health Insurance Corporation. Examinees were 30-80 years old and free of cancer in the baseline years of 1996 and 1997. An independent population of 547,874 men and 415,875 women who participated in 1998 and 1999 examinations was used to validate the model. Model validation was done by evaluating its performance in terms of discrimination and calibration ability using the C-statistic and Hosmer-Lemeshow-type chi-square statistics. RESULTS: Age, body mass index, serum cholesterol, family history of cancer, and alcohol consumption were included in all models for men, whereas age, height, and meat intake frequency were included in all models for women. Models showed moderately good discrimination ability with C-statistics between 0.69 and 0.78. The C-statistics were generally higher in the models for men, whereas the calibration abilities were generally better in the models for women. CONCLUSIONS: Colorectal cancer risk prediction models were developed from large-scale, population-based data. Those models can be used for identifying high risk groups and developing preventive intervention strategies for colorectal cancer.
Diffeomorphic Statistical Deformation Models
DEFF Research Database (Denmark)
Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus
2007-01-01
In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al....... The modifications ensure that no boundary restriction has to be enforced on the parameter space to prevent folds or tears in the deformation field. For straightforward statistical analysis, principal component analysis and sparse methods, we assume that the parameters for a class of deformations lie on a linear...... with ground truth in form of manual expert annotations, and compared to Cootes's model. We anticipate applications in unconstrained diffeomorphic synthesis of images, e.g. for tracking, segmentation, registration or classification purposes....
Prediction skill of rainstorm events over India in the TIGGE weather prediction models
Karuna Sagar, S.; Rajeevan, M.; Vijaya Bhaskara Rao, S.; Mitra, A. K.
2017-12-01
Extreme rainfall events pose a serious threat of leading to severe floods in many countries worldwide. Therefore, advance prediction of its occurrence and spatial distribution is very essential. In this paper, an analysis has been made to assess the skill of numerical weather prediction models in predicting rainstorms over India. Using gridded daily rainfall data set and objective criteria, 15 rainstorms were identified during the monsoon season (June to September). The analysis was made using three TIGGE (THe Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble) models. The models considered are the European Centre for Medium-Range Weather Forecasts (ECMWF), National Centre for Environmental Prediction (NCEP) and the UK Met Office (UKMO). Verification of the TIGGE models for 43 observed rainstorm days from 15 rainstorm events has been made for the period 2007-2015. The comparison reveals that rainstorm events are predictable up to 5 days in advance, however with a bias in spatial distribution and intensity. The statistical parameters like mean error (ME) or Bias, root mean square error (RMSE) and correlation coefficient (CC) have been computed over the rainstorm region using the multi-model ensemble (MME) mean. The study reveals that the spread is large in ECMWF and UKMO followed by the NCEP model. Though the ensemble spread is quite small in NCEP, the ensemble member averages are not well predicted. The rank histograms suggest that the forecasts are under prediction. The modified Contiguous Rain Area (CRA) technique was used to verify the spatial as well as the quantitative skill of the TIGGE models. Overall, the contribution from the displacement and pattern errors to the total RMSE is found to be more in magnitude. The volume error increases from 24 hr forecast to 48 hr forecast in all the three models.
DEFF Research Database (Denmark)
Siriwardhana, Chathura; Fang, Rui; Salanti, Ali
2017-01-01
Background Plasmodium falciparum infections are especially severe in pregnant women because infected erythrocytes (IE) express VAR2CSA, a ligand that binds to placental trophoblasts, causing IE to accumulate in the placenta. Resulting inflammation and pathology increases a woman’s risk of anemia...... to 28 malarial antigens and used the data to develop statistical models for predicting if a woman has sufficient immunity to prevent PM. Methods Archival plasma samples from 1377 women were screened in a bead-based multiplex assay for Ab to 17 VAR2CSA-associated antigens (full length VAR2CSA (FV2), DBL...... in the following seven statistical approaches: logistic regression full model, logistic regression reduced model, recursive partitioning, random forests, linear discriminant analysis, quadratic discriminant analysis, and support vector machine. Results The best and simplest model proved to be the logistic...
Atmospheric corrosion: statistical validation of models
International Nuclear Information System (INIS)
Diaz, V.; Martinez-Luaces, V.; Guineo-Cobs, G.
2003-01-01
In this paper we discuss two different methods for validation of regression models, applied to corrosion data. One of them is based on the correlation coefficient and the other one is the statistical test of lack of fit. Both methods are used here to analyse fitting of bi logarithmic model in order to predict corrosion for very low carbon steel substrates in rural and urban-industrial atmospheres in Uruguay. Results for parameters A and n of the bi logarithmic model are reported here. For this purpose, all repeated values were used instead of using average values as usual. Modelling is carried out using experimental data corresponding to steel substrates under the same initial meteorological conditions ( in fact, they are put in the rack at the same time). Results of correlation coefficient are compared with the lack of it tested at two different signification levels (α=0.01 and α=0.05). Unexpected differences between them are explained and finally, it is possible to conclude, at least in the studied atmospheres, that the bi logarithmic model does not fit properly the experimental data. (Author) 18 refs
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2017-11-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
International Nuclear Information System (INIS)
Gridneva, S.A.; Rus'kin, V.I.
1980-01-01
Basic features of the statistical model of multiple hadron production based on microcanonical distribution and taking into account the laws of conservation of total angular momentum, isotopic spin, p-, G-, C-eveness and Bose-Einstein statistics requirements are given. The model predictions are compared with experimental data on anti NN annihilation at rest and e + e - annihilation in hadrons at annihilation total energy from 2 to 3 GeV [ru
Andersson, C. David; Hillgren, J. Mikael; Lindgren, Cecilia; Qian, Weixing; Akfur, Christine; Berg, Lotta; Ekström, Fredrik; Linusson, Anna
2015-03-01
Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.
Abut, Fatih; Akay, Mehmet Fatih
2015-01-01
Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.
Tornadoes and related damage costs: statistical modeling with a semi-Markov approach
Corini, Chiara; D'Amico, Guglielmo; Petroni, Filippo; Prattico, Flavio; Manca, Raimondo
2015-01-01
We propose a statistical approach to tornadoes modeling for predicting and simulating occurrences of tornadoes and accumulated cost distributions over a time interval. This is achieved by modeling the tornadoes intensity, measured with the Fujita scale, as a stochastic process. Since the Fujita scale divides tornadoes intensity into six states, it is possible to model the tornadoes intensity by using Markov and semi-Markov models. We demonstrate that the semi-Markov approach is able to reprod...
A statistical mechanics model for free-for-all airplane passenger boarding
Steffen, Jason H.
2008-12-01
I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics, where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. The model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty.
A statistical mechanics model for free-for-all airplane passenger boarding
International Nuclear Information System (INIS)
Steffen, Jason H.; Fermilab
2008-01-01
I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. The model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty
A statistical mechanics model for free-for-all airplane passenger boarding
Energy Technology Data Exchange (ETDEWEB)
Steffen, Jason H.; /Fermilab
2008-08-01
I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. The model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty.
Directory of Open Access Journals (Sweden)
Abut F
2015-08-01
Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance
Statistical characterization of pitting corrosion process and life prediction
International Nuclear Information System (INIS)
Sheikh, A.K.; Younas, M.
1995-01-01
In order to prevent corrosion failures of machines and structures, it is desirable to know in advance when the corrosion damage will take place, and appropriate measures are needed to mitigate the damage. The corrosion predictions are needed both at development as well as operational stage of machines and structures. There are several forms of corrosion process through which varying degrees of damage can occur. Under certain conditions these corrosion processes at alone and in other set of conditions, several of these processes may occur simultaneously. For a certain type of machine elements and structures, such as gears, bearing, tubes, pipelines, containers, storage tanks etc., are particularly prone to pitting corrosion which is an insidious form of corrosion. The corrosion predictions are usually based on experimental results obtained from test coupons and/or field experiences of similar machines or parts of a structure. Considerable scatter is observed in corrosion processes. The probabilities nature and kinetics of pitting process makes in necessary to use statistical method to forecast the residual life of machine of structures. The focus of this paper is to characterization pitting as a time-dependent random process, and using this characterization the prediction of life to reach a critical level of pitting damage can be made. Using several data sets from literature on pitting corrosion, the extreme value modeling of pitting corrosion process, the evolution of the extreme value distribution in time, and their relationship to the reliability of machines and structure are explained. (author)
Nolan, Bernard T.; Fienen, Michael N.; Lorenz, David L.
2015-01-01
We used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R2 and a version with R2 within one standard error of the maximum (the 1SE model). The former yielded CV training R2 values of 0.94–1.0. Cross-validation testing R2 values indicate predictive performance, and these were 0.22–0.39 for the maximum R2 models and 0.19–0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley.
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
Exclusion statistics and integrable models
International Nuclear Information System (INIS)
Mashkevich, S.
1998-01-01
The definition of exclusion statistics, as given by Haldane, allows for a statistical interaction between distinguishable particles (multi-species statistics). The thermodynamic quantities for such statistics ca be evaluated exactly. The explicit expressions for the cluster coefficients are presented. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models. The interesting questions of generalizing this correspondence onto the higher-dimensional and the multi-species cases remain essentially open
A survey on computational intelligence approaches for predictive modeling in prostate cancer
Cosma, G; Brown, D; Archer, M; Khan, M; Pockley, AG
2017-01-01
Predictive modeling in medicine involves the development of computational models which are capable of analysing large amounts of data in order to predict healthcare outcomes for individual patients. Computational intelligence approaches are suitable when the data to be modelled are too complex forconventional statistical techniques to process quickly and eciently. These advanced approaches are based on mathematical models that have been especially developed for dealing with the uncertainty an...
Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M
2015-07-01
Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.
REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH
Directory of Open Access Journals (Sweden)
R. SATISHKUMAR
2017-01-01
Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.
An Intelligent Model for Stock Market Prediction
Directory of Open Access Journals (Sweden)
IbrahimM. Hamed
2012-08-01
Full Text Available This paper presents an intelligent model for stock market signal prediction using Multi-Layer Perceptron (MLP Artificial Neural Networks (ANN. Blind source separation technique, from signal processing, is integrated with the learning phase of the constructed baseline MLP ANN to overcome the problems of prediction accuracy and lack of generalization. Kullback Leibler Divergence (KLD is used, as a learning algorithm, because it converges fast and provides generalization in the learning mechanism. Both accuracy and efficiency of the proposed model were confirmed through the Microsoft stock, from wall-street market, and various data sets, from different sectors of the Egyptian stock market. In addition, sensitivity analysis was conducted on the various parameters of the model to ensure the coverage of the generalization issue. Finally, statistical significance was examined using ANOVA test.
The non-equilibrium statistical mechanics of a simple geophysical fluid dynamics model
Verkley, Wim; Severijns, Camiel
2014-05-01
Lorenz [1] has devised a dynamical system that has proved to be very useful as a benchmark system in geophysical fluid dynamics. The system in its simplest form consists of a periodic array of variables that can be associated with an atmospheric field on a latitude circle. The system is driven by a constant forcing, is damped by linear friction and has a simple advection term that causes the model to behave chaotically if the forcing is large enough. Our aim is to predict the statistics of Lorenz' model on the basis of a given average value of its total energy - obtained from a numerical integration - and the assumption of statistical stationarity. Our method is the principle of maximum entropy [2] which in this case reads: the information entropy of the system's probability density function shall be maximal under the constraints of normalization, a given value of the average total energy and statistical stationarity. Statistical stationarity is incorporated approximately by using `stationarity constraints', i.e., by requiring that the average first and possibly higher-order time-derivatives of the energy are zero in the maximization of entropy. The analysis [3] reveals that, if the first stationarity constraint is used, the resulting probability density function rather accurately reproduces the statistics of the individual variables. If the second stationarity constraint is used as well, the correlations between the variables are also reproduced quite adequately. The method can be generalized straightforwardly and holds the promise of a viable non-equilibrium statistical mechanics of the forced-dissipative systems of geophysical fluid dynamics. [1] E.N. Lorenz, 1996: Predictability - A problem partly solved, in Proc. Seminar on Predictability (ECMWF, Reading, Berkshire, UK), Vol. 1, pp. 1-18. [2] E.T. Jaynes, 2003: Probability Theory - The Logic of Science (Cambridge University Press, Cambridge). [3] W.T.M. Verkley and C.A. Severijns, 2014: The maximum entropy
Statistical predictions from anarchic field theory landscapes
International Nuclear Information System (INIS)
Balasubramanian, Vijay; Boer, Jan de; Naqvi, Asad
2010-01-01
Consistent coupling of effective field theories with a quantum theory of gravity appears to require bounds on the rank of the gauge group and the amount of matter. We consider landscapes of field theories subject to such to boundedness constraints. We argue that appropriately 'coarse-grained' aspects of the randomly chosen field theory in such landscapes, such as the fraction of gauge groups with ranks in a given range, can be statistically predictable. To illustrate our point we show how the uniform measures on simple classes of N=1 quiver gauge theories localize in the vicinity of theories with certain typical structures. Generically, this approach would predict a high energy theory with very many gauge factors, with the high rank factors largely decoupled from the low rank factors if we require asymptotic freedom for the latter.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Quantifying uncertainty for predictions with model error in non-Gaussian systems with intermittency
International Nuclear Information System (INIS)
Branicki, Michal; Majda, Andrew J
2012-01-01
This paper discusses a range of important mathematical issues arising in applications of a newly emerging stochastic-statistical framework for quantifying and mitigating uncertainties associated with prediction of partially observed and imperfectly modelled complex turbulent dynamical systems. The need for such a framework is particularly severe in climate science where the true climate system is vastly more complicated than any conceivable model; however, applications in other areas, such as neural networks and materials science, are just as important. The mathematical tools employed here rely on empirical information theory and fluctuation–dissipation theorems (FDTs) and it is shown that they seamlessly combine into a concise systematic framework for measuring and optimizing consistency and sensitivity of imperfect models. Here, we utilize a simple statistically exactly solvable ‘perfect’ system with intermittent hidden instabilities and with time-periodic features to address a number of important issues encountered in prediction of much more complex dynamical systems. These problems include the role and mitigation of model error due to coarse-graining, moment closure approximations, and the memory of initial conditions in producing short, medium and long-range predictions. Importantly, based on a suite of increasingly complex imperfect models of the perfect test system, we show that the predictive skill of the imperfect models and their sensitivity to external perturbations is improved by ensuring their consistency on the statistical attractor (i.e. the climate) with the perfect system. Furthermore, the discussed link between climate fidelity and sensitivity via the FDT opens up an enticing prospect of developing techniques for improving imperfect model sensitivity based on specific tests carried out in the training phase of the unperturbed statistical equilibrium/climate. (paper)
Using machine learning, neural networks and statistics to predict bankruptcy
Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.
1997-01-01
Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear
Statistical methods for mechanistic model validation: Salt Repository Project
International Nuclear Information System (INIS)
Eggett, D.L.
1988-07-01
As part of the Department of Energy's Salt Repository Program, Pacific Northwest Laboratory (PNL) is studying the emplacement of nuclear waste containers in a salt repository. One objective of the SRP program is to develop an overall waste package component model which adequately describes such phenomena as container corrosion, waste form leaching, spent fuel degradation, etc., which are possible in the salt repository environment. The form of this model will be proposed, based on scientific principles and relevant salt repository conditions with supporting data. The model will be used to predict the future characteristics of the near field environment. This involves several different submodels such as the amount of time it takes a brine solution to contact a canister in the repository, how long it takes a canister to corrode and expose its contents to the brine, the leach rate of the contents of the canister, etc. These submodels are often tested in a laboratory and should be statistically validated (in this context, validate means to demonstrate that the model adequately describes the data) before they can be incorporated into the waste package component model. This report describes statistical methods for validating these models. 13 refs., 1 fig., 3 tabs
Energy Technology Data Exchange (ETDEWEB)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com [Academy of Scientific and Innovative Research, Council of Scientific and Industrial Research, New Delhi (India); Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001 (India); Gupta, Shikha; Rai, Premanjali [Academy of Scientific and Innovative Research, Council of Scientific and Industrial Research, New Delhi (India); Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001 (India)
2013-10-15
Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models was performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive
Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali
2013-04-01
The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.
Apel, Heiko; Baimaganbetov, Azamat; Kalashnikova, Olga; Gavrilenko, Nadejda; Abdykerimova, Zharkinay; Agalhanova, Marina; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Gafurov, Abror
2017-04-01
The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt dominated river discharge originating in the mountains provides the main water resource available for agricultural production, but also for storage in reservoirs for energy generation during the winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources. In fact, seasonal forecasts are mandatory tasks of all national hydro-meteorological services in the region. In order to support the operational seasonal forecast procedures of hydromet services, this study aims at the development of a generic tool for deriving statistical forecast models of seasonal river discharge. The generic model is kept as simple as possible in order to be driven by available hydrological and meteorological data, and be applicable for all catchments with their often limited data availability in the region. As snowmelt dominates summer runoff, the main meteorological predictors for the forecast models are monthly values of winter precipitation and temperature as recorded by climatological stations in the catchments. These data sets are accompanied by snow cover predictors derived from the operational ModSnow tool, which provides cloud free snow cover data for the selected catchments based on MODIS satellite images. In addition to the meteorological data antecedent streamflow is used as a predictor variable. This basic predictor set was further extended by multi-monthly means of the individual predictors, as well as composites of the predictors. Forecast models are derived based on these predictors as linear combinations of up to 3 or 4 predictors. A user selectable number of best models according to pre-defined performance criteria is extracted automatically by the developed model fitting algorithm, which includes a test
Reliability prediction system based on the failure rate model for electronic components
International Nuclear Information System (INIS)
Lee, Seung Woo; Lee, Hwa Ki
2008-01-01
Although many methodologies for predicting the reliability of electronic components have been developed, their reliability might be subjective according to a particular set of circumstances, and therefore it is not easy to quantify their reliability. Among the reliability prediction methods are the statistical analysis based method, the similarity analysis method based on an external failure rate database, and the method based on the physics-of-failure model. In this study, we developed a system by which the reliability of electronic components can be predicted by creating a system for the statistical analysis method of predicting reliability most easily. The failure rate models that were applied are MILHDBK- 217F N2, PRISM, and Telcordia (Bellcore), and these were compared with the general purpose system in order to validate the effectiveness of the developed system. Being able to predict the reliability of electronic components from the stage of design, the system that we have developed is expected to contribute to enhancing the reliability of electronic components
Population activity statistics dissect subthreshold and spiking variability in V1.
Bányai, Mihály; Koman, Zsombor; Orbán, Gergő
2017-07-01
Response variability, as measured by fluctuating responses upon repeated performance of trials, is a major component of neural responses, and its characterization is key to interpret high dimensional population recordings. Response variability and covariability display predictable changes upon changes in stimulus and cognitive or behavioral state, providing an opportunity to test the predictive power of models of neural variability. Still, there is little agreement on which model to use as a building block for population-level analyses, and models of variability are often treated as a subject of choice. We investigate two competing models, the doubly stochastic Poisson (DSP) model assuming stochasticity at spike generation, and the rectified Gaussian (RG) model tracing variability back to membrane potential variance, to analyze stimulus-dependent modulation of both single-neuron and pairwise response statistics. Using a pair of model neurons, we demonstrate that the two models predict similar single-cell statistics. However, DSP and RG models have contradicting predictions on the joint statistics of spiking responses. To test the models against data, we build a population model to simulate stimulus change-related modulations in pairwise response statistics. We use single-unit data from the primary visual cortex (V1) of monkeys to show that while model predictions for variance are qualitatively similar to experimental data, only the RG model's predictions are compatible with joint statistics. These results suggest that models using Poisson-like variability might fail to capture important properties of response statistics. We argue that membrane potential-level modeling of stochasticity provides an efficient strategy to model correlations. NEW & NOTEWORTHY Neural variability and covariability are puzzling aspects of cortical computations. For efficient decoding and prediction, models of information encoding in neural populations hinge on an appropriate model of
Stone, Wesley W.; Gilliom, Robert J.
2012-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Directory of Open Access Journals (Sweden)
Dongdong Sun
Full Text Available Hypertension is a leading global health threat and a major cardiovascular disease. Since clinical interventions are effective in delaying the disease progression from prehypertension to hypertension, diagnostic prediction models to identify patient populations at high risk for hypertension are imperative.Both PubMed and Embase databases were searched for eligible reports of either prediction models or risk scores of hypertension. The study data were collected, including risk factors, statistic methods, characteristics of study design and participants, performance measurement, etc.From the searched literature, 26 studies reporting 48 prediction models were selected. Among them, 20 reports studied the established models using traditional risk factors, such as body mass index (BMI, age, smoking, blood pressure (BP level, parental history of hypertension, and biochemical factors, whereas 6 reports used genetic risk score (GRS as the prediction factor. AUC ranged from 0.64 to 0.97, and C-statistic ranged from 60% to 90%.The traditional models are still the predominant risk prediction models for hypertension, but recently, more models have begun to incorporate genetic factors as part of their model predictors. However, these genetic predictors need to be well selected. The current reported models have acceptable to good discrimination and calibration ability, but whether the models can be applied in clinical practice still needs more validation and adjustment.
Model-free prediction and regression a transformation-based approach to inference
Politis, Dimitris N
2015-01-01
The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...
ARSENIC CONTAMINATION IN GROUNDWATER: A STATISTICAL MODELING
Directory of Open Access Journals (Sweden)
Palas Roy
2013-01-01
Full Text Available High arsenic in natural groundwater in most of the tubewells of the Purbasthali- Block II area of Burdwan district (W.B, India has recently been focused as a serious environmental concern. This paper is intending to illustrate the statistical modeling of the arsenic contaminated groundwater to identify the interrelation of that arsenic contain with other participating groundwater parameters so that the arsenic contamination level can easily be predicted by analyzing only such parameters. Multivariate data analysis was done with the collected groundwater samples from the 132 tubewells of this contaminated region shows that three variable parameters are significantly related with the arsenic. Based on these relationships, a multiple linear regression model has been developed that estimated the arsenic contamination by measuring such three predictor parameters of the groundwater variables in the contaminated aquifer. This model could also be a suggestive tool while designing the arsenic removal scheme for any affected groundwater.
Grosveld, Ferdinand W.
1990-01-01
The feasibility of predicting interior noise due to random acoustic or turbulent boundary layer excitation was investigated in experiments in which a statistical energy analysis model (VAPEPS) was used to analyze measurements of the acceleration response and sound transmission of flat aluminum, lucite, and graphite/epoxy plates exposed to random acoustic or turbulent boundary layer excitation. The noise reduction of the plate, when backed by a shallow cavity and excited by a turbulent boundary layer, was predicted using a simplified theory based on the assumption of adiabatic compression of the fluid in the cavity. The predicted plate acceleration response was used as input in the noise reduction prediction. Reasonable agreement was found between the predictions and the measured noise reduction in the frequency range 315-1000 Hz.
Comparison of the models of financial distress prediction
Directory of Open Access Journals (Sweden)
Jiří Omelka
2013-01-01
Full Text Available Prediction of the financial distress is generally supposed as approximation if a business entity is closed on bankruptcy or at least on serious financial problems. Financial distress is defined as such a situation when a company is not able to satisfy its liabilities in any forms, or when its liabilities are higher than its assets. Classification of financial situation of business entities represents a multidisciplinary scientific issue that uses not only the economic theoretical bases but interacts to the statistical, respectively to econometric approaches as well.The first models of financial distress prediction have originated in the sixties of the 20th century. One of the most known is the Altman’s model followed by a range of others which are constructed on more or less conformable bases. In many existing models it is possible to find common elements which could be marked as elementary indicators of potential financial distress of a company. The objective of this article is, based on the comparison of existing models of prediction of financial distress, to define the set of basic indicators of company’s financial distress at conjoined identification of their critical aspects. The sample defined this way will be a background for future research focused on determination of one-dimensional model of financial distress prediction which would subsequently become a basis for construction of multi-dimensional prediction model.
Towards a Statistical Model of Tropical Cyclone Genesis
Fernandez, A.; Kashinath, K.; McAuliffe, J.; Prabhat, M.; Stark, P. B.; Wehner, M. F.
2017-12-01
Tropical Cyclones (TCs) are important extreme weather phenomena that have a strong impact on humans. TC forecasts are largely based on global numerical models that produce TC-like features. Aspects of Tropical Cyclones such as their formation/genesis, evolution, intensification and dissipation over land are important and challenging problems in climate science. This study investigates the environmental conditions associated with Tropical Cyclone Genesis (TCG) by testing how accurately a statistical model can predict TCG in the CAM5.1 climate model. TCG events are defined using TECA software @inproceedings{Prabhat2015teca, title={TECA: Petascale Pattern Recognition for Climate Science}, author={Prabhat and Byna, Surendra and Vishwanath, Venkatram and Dart, Eli and Wehner, Michael and Collins, William D}, booktitle={Computer Analysis of Images and Patterns}, pages={426-436}, year={2015}, organization={Springer}} to extract TC trajectories from CAM5.1. L1-regularized logistic regression (L1LR) is applied to the CAM5.1 output. The predictions have nearly perfect accuracy for data not associated with TC tracks and high accuracy differentiating between high vorticity and low vorticity systems. The model's active variables largely correspond to current hypotheses about important factors for TCG, such as wind field patterns and local pressure minima, and suggests new routes for investigation. Furthermore, our model's predictions of TC activity are competitive with the output of an instantaneous version of Emanuel and Nolan's Genesis Potential Index (GPI) @inproceedings{eman04, title = "Tropical cyclone activity and the global climate system", author = "Kerry Emanuel and Nolan, {David S.}", year = "2004", pages = "240-241", booktitle = "26th Conference on Hurricanes and Tropical Meteorology"}.
Sapsis, Themistoklis P; Majda, Andrew J
2013-08-20
A framework for low-order predictive statistical modeling and uncertainty quantification in turbulent dynamical systems is developed here. These reduced-order, modified quasilinear Gaussian (ROMQG) algorithms apply to turbulent dynamical systems in which there is significant linear instability or linear nonnormal dynamics in the unperturbed system and energy-conserving nonlinear interactions that transfer energy from the unstable modes to the stable modes where dissipation occurs, resulting in a statistical steady state; such turbulent dynamical systems are ubiquitous in geophysical and engineering turbulence. The ROMQG method involves constructing a low-order, nonlinear, dynamical system for the mean and covariance statistics in the reduced subspace that has the unperturbed statistics as a stable fixed point and optimally incorporates the indirect effect of non-Gaussian third-order statistics for the unperturbed system in a systematic calibration stage. This calibration procedure is achieved through information involving only the mean and covariance statistics for the unperturbed equilibrium. The performance of the ROMQG algorithm is assessed on two stringent test cases: the 40-mode Lorenz 96 model mimicking midlatitude atmospheric turbulence and two-layer baroclinic models for high-latitude ocean turbulence with over 125,000 degrees of freedom. In the Lorenz 96 model, the ROMQG algorithm with just a single mode captures the transient response to random or deterministic forcing. For the baroclinic ocean turbulence models, the inexpensive ROMQG algorithm with 252 modes, less than 0.2% of the total, captures the nonlinear response of the energy, the heat flux, and even the one-dimensional energy and heat flux spectra.
Petersen, Japke F; Stuiver, Martijn M; Timmermans, Adriana J; Chen, Amy; Zhang, Hongzhen; O'Neill, James P; Deady, Sandra; Vander Poorten, Vincent; Meulemans, Jeroen; Wennerberg, Johan; Skroder, Carl; Day, Andrew T; Koch, Wayne; van den Brekel, Michiel W M
2018-05-01
TNM-classification inadequately estimates patient-specific overall survival (OS). We aimed to improve this by developing a risk-prediction model for patients with advanced larynx cancer. Cohort study. We developed a risk prediction model to estimate the 5-year OS rate based on a cohort of 3,442 patients with T3T4N0N+M0 larynx cancer. The model was internally validated using bootstrapping samples and externally validated on patient data from five external centers (n = 770). The main outcome was performance of the model as tested by discrimination, calibration, and the ability to distinguish risk groups based on tertiles from the derivation dataset. The model performance was compared to a model based on T and N classification only. We included age, gender, T and N classification, and subsite as prognostic variables in the standard model. After external validation, the standard model had a significantly better fit than a model based on T and N classification alone (C statistic, 0.59 vs. 0.55, P statistic to 0.68. A risk prediction model for patients with advanced larynx cancer, consisting of readily available clinical variables, gives more accurate estimations of the estimated 5-year survival rate when compared to a model based on T and N classification alone. 2c. Laryngoscope, 128:1140-1145, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Verification of some numerical models for operationally predicting mesoscale winds aloft
International Nuclear Information System (INIS)
Cornett, J.S.; Randerson, D.
1977-01-01
Four numerical models are described for predicting mesoscale winds aloft for a 6 h period. These models are all tested statistically against persistence as the control forecast and against predictions made by operational forecasters. Mesoscale winds aloft data were used to initialize the models and to verify the predictions on an hourly basis. The model yielding the smallest root-mean-square vector errors (RMSVE's) was the one based on the most physics which included advection, ageostrophic acceleration, vertical mixing and friction. Horizontal advection was found to be the most important term in reducing the RMSVE's followed by ageostrophic acceleration, vertical advection, surface friction and vertical mixing. From a comparison of the mean absolute errors based on up to 72 independent wind-profile predictions made by operational forecasters, by the most complete model, and by persistence, we conclude that the model is the best wind predictor in the free air. In the boundary layer, the results tend to favor the forecaster for direction predictions. The speed predictions showed no overall superiority in any of these three models
Sahle, Berhe W; Owen, Alice J; Chin, Ken Lee; Reid, Christopher M
2017-09-01
Numerous models predicting the risk of incident heart failure (HF) have been developed; however, evidence of their methodological rigor and reporting remains unclear. This study critically appraises the methods underpinning incident HF risk prediction models. EMBASE and PubMed were searched for articles published between 1990 and June 2016 that reported at least 1 multivariable model for prediction of HF. Model development information, including study design, variable coding, missing data, and predictor selection, was extracted. Nineteen studies reporting 40 risk prediction models were included. Existing models have acceptable discriminative ability (C-statistics > 0.70), although only 6 models were externally validated. Candidate variable selection was based on statistical significance from a univariate screening in 11 models, whereas it was unclear in 12 models. Continuous predictors were retained in 16 models, whereas it was unclear how continuous variables were handled in 16 models. Missing values were excluded in 19 of 23 models that reported missing data, and the number of events per variable was models. Only 2 models presented recommended regression equations. There was significant heterogeneity in discriminative ability of models with respect to age (P prediction models that had sufficient discriminative ability, although few are externally validated. Methods not recommended for the conduct and reporting of risk prediction modeling were frequently used, and resulting algorithms should be applied with caution. Copyright © 2017 Elsevier Inc. All rights reserved.
Statistical modelling of railway track geometry degradation using Hierarchical Bayesian models
International Nuclear Information System (INIS)
Andrade, A.R.; Teixeira, P.F.
2015-01-01
Railway maintenance planners require a predictive model that can assess the railway track geometry degradation. The present paper uses a Hierarchical Bayesian model as a tool to model the main two quality indicators related to railway track geometry degradation: the standard deviation of longitudinal level defects and the standard deviation of horizontal alignment defects. Hierarchical Bayesian Models (HBM) are flexible statistical models that allow specifying different spatially correlated components between consecutive track sections, namely for the deterioration rates and the initial qualities parameters. HBM are developed for both quality indicators, conducting an extensive comparison between candidate models and a sensitivity analysis on prior distributions. HBM is applied to provide an overall assessment of the degradation of railway track geometry, for the main Portuguese railway line Lisbon–Oporto. - Highlights: • Rail track geometry degradation is analysed using Hierarchical Bayesian models. • A Gibbs sampling strategy is put forward to estimate the HBM. • Model comparison and sensitivity analysis find the most suitable model. • We applied the most suitable model to all the segments of the main Portuguese line. • Tackling spatial correlations using CAR structures lead to a better model fit
A statistical model to predict total column ozone in Peninsular Malaysia
Tan, K. C.; Lim, H. S.; Mat Jafri, M. Z.
2016-03-01
This study aims to predict monthly columnar ozone in Peninsular Malaysia based on concentrations of several atmospheric gases. Data pertaining to five atmospheric gases (CO2, O3, CH4, NO2, and H2O vapor) were retrieved by satellite scanning imaging absorption spectrometry for atmospheric chartography from 2003 to 2008 and used to develop a model to predict columnar ozone in Peninsular Malaysia. Analyses of the northeast monsoon (NEM) and the southwest monsoon (SWM) seasons were conducted separately. Based on the Pearson correlation matrices, columnar ozone was negatively correlated with H2O vapor but positively correlated with CO2 and NO2 during both the NEM and SWM seasons from 2003 to 2008. This result was expected because NO2 is a precursor of ozone. Therefore, an increase in columnar ozone concentration is associated with an increase in NO2 but a decrease in H2O vapor. In the NEM season, columnar ozone was negatively correlated with H2O (-0.847), NO2 (0.754), and CO2 (0.477); columnar ozone was also negatively but weakly correlated with CH4 (-0.035). In the SWM season, columnar ozone was highly positively correlated with NO2 (0.855), CO2 (0.572), and CH4 (0.321) and also highly negatively correlated with H2O (-0.832). Both multiple regression and principal component analyses were used to predict the columnar ozone value in Peninsular Malaysia. We obtained the best-fitting regression equations for the columnar ozone data using four independent variables. Our results show approximately the same R value (≈ 0.83) for both the NEM and SWM seasons.
Schroeder, Emily B; Xu, Stan; Goodrich, Glenn K; Nichols, Gregory A; O'Connor, Patrick J; Steiner, John F
2017-07-01
To develop and externally validate a prediction model for the 6-month risk of a severe hypoglycemic event among individuals with pharmacologically treated diabetes. The development cohort consisted of 31,674 Kaiser Permanente Colorado members with pharmacologically treated diabetes (2007-2015). The validation cohorts consisted of 38,764 Kaiser Permanente Northwest members and 12,035 HealthPartners members. Variables were chosen that would be available in electronic health records. We developed 16-variable and 6-variable models, using a Cox counting model process that allows for the inclusion of multiple 6-month observation periods per person. Across the three cohorts, there were 850,992 6-month observation periods, and 10,448 periods with at least one severe hypoglycemic event. The six-variable model contained age, diabetes type, HgbA1c, eGFR, history of a hypoglycemic event in the prior year, and insulin use. Both prediction models performed well, with good calibration and c-statistics of 0.84 and 0.81 for the 16-variable and 6-variable models, respectively. In the external validation cohorts, the c-statistics were 0.80-0.84. We developed and validated two prediction models for predicting the 6-month risk of hypoglycemia. The 16-variable model had slightly better performance than the 6-variable model, but in some practice settings, use of the simpler model may be preferred. Copyright © 2017 Elsevier Inc. All rights reserved.
International Nuclear Information System (INIS)
Heinrich, S.
2006-01-01
Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)
A statistical model for field emission in superconducting cavities
International Nuclear Information System (INIS)
Padamsee, H.; Green, K.; Jost, W.; Wright, B.
1993-01-01
A statistical model is used to account for several features of performance of an ensemble of superconducting cavities. The input parameters are: the number of emitters/area, a distribution function for emitter β values, a distribution function for emissive areas, and a processing threshold. The power deposited by emitters is calculated from the field emission current and electron impact energy. The model can successfully account for the fraction of tests that reach the maximum field Epk in an ensemble of cavities, for eg, 1-cells at sign 3 GHz or 5-cells at sign 1.5 GHz. The model is used to predict the level of power needed to successfully process cavities of various surface areas with high pulsed power processing (HPP)
Classical model of intermediate statistics
International Nuclear Information System (INIS)
Kaniadakis, G.
1994-01-01
In this work we present a classical kinetic model of intermediate statistics. In the case of Brownian particles we show that the Fermi-Dirac (FD) and Bose-Einstein (BE) distributions can be obtained, just as the Maxwell-Boltzmann (MD) distribution, as steady states of a classical kinetic equation that intrinsically takes into account an exclusion-inclusion principle. In our model the intermediate statistics are obtained as steady states of a system of coupled nonlinear kinetic equations, where the coupling constants are the transmutational potentials η κκ' . We show that, besides the FD-BE intermediate statistics extensively studied from the quantum point of view, we can also study the MB-FD and MB-BE ones. Moreover, our model allows us to treat the three-state mixing FD-MB-BE intermediate statistics. For boson and fermion mixing in a D-dimensional space, we obtain a family of FD-BE intermediate statistics by varying the transmutational potential η BF . This family contains, as a particular case when η BF =0, the quantum statistics recently proposed by L. Wu, Z. Wu, and J. Sun [Phys. Lett. A 170, 280 (1992)]. When we consider the two-dimensional FD-BE statistics, we derive an analytic expression of the fraction of fermions. When the temperature T→∞, the system is composed by an equal number of bosons and fermions, regardless of the value of η BF . On the contrary, when T=0, η BF becomes important and, according to its value, the system can be completely bosonic or fermionic, or composed both by bosons and fermions
Hybrid ATDL-gamma distribution model for predicting area source acid gas concentrations
Energy Technology Data Exchange (ETDEWEB)
Jakeman, A J; Taylor, J A
1985-01-01
An air quality model is developed to predict the distribution of concentrations of acid gas in an urban airshed. The model is hybrid in character, combining reliable features of a deterministic ATDL-based model with statistical distributional approaches. The gamma distribution was identified from a range of distributional models as the best model. The paper shows that the assumptions of a previous hybrid model may be relaxed and presents a methodology for characterizing the uncertainty associated with model predictions. Results are demonstrated for the 98-percentile predictions of 24-h average data over annual periods at six monitoring sites. This percentile relates to the World Health Organization goal for acid gas concentrations.
Statistical timing for parametric yield prediction of digital integrated circuits
Jess, J.A.G.; Kalafala, K.; Naidu, S.R.; Otten, R.H.J.M.; Visweswariah, C.
2006-01-01
Uncertainty in circuit performance due to manufacturing and environmental variations is increasing with each new generation of technology. It is therefore important to predict the performance of a chip as a probabilistic quantity. This paper proposes three novel path-based algorithms for statistical
Directory of Open Access Journals (Sweden)
Mark V Albert
2012-12-01
Full Text Available Due to multiple factors such as fatigue, muscle strengthening, and neural plasticity, the responsiveness of the motor apparatus to neural commands changes over time. To enable precise movements the nervous system must adapt to compensate for these changes. Recent models of motor adaptation derive from assumptions about the way the motor apparatus changes. Characterizing these changes is difficult because motor adaptation happens at the same time, masking most of the effects of ongoing changes. Here, we analyze eye movements of monkeys with lesions to the posterior cerebellar vermis that impair adaptation. Their fluctuations better reveal the underlying changes of the motor system over time. When these measured, unadapted changes are used to derive optimal motor adaptation rules the prediction precision significantly improves. Among three models that similarly fit single-day adaptation results, the model that also matches the temporal correlations of the nonadapting saccades most accurately predicts multiple day adaptation. Saccadic gain adaptation is well matched to the natural statistics of fluctuations of the oculomotor plant.
Probing NWP model deficiencies by statistical postprocessing
DEFF Research Database (Denmark)
Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.
2016-01-01
The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational....... Based on the statistical model candidates inferred from the data, the lifted index NWP model diagnostic is consistently found among the NWP model predictors of the best performing statistical models across sites....
Development of Interpretable Predictive Models for BPH and Prostate Cancer.
Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, J A
2015-01-01
Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Statistical dependence with PC and BPH was found for prostate volume (P-value BPH prediction. PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.
Statistical parameters as a means to a priori assess the accuracy of solar forecasting models
International Nuclear Information System (INIS)
Voyant, Cyril; Soubdhan, Ted; Lauret, Philippe; David, Mathieu; Muselli, Marc
2015-01-01
In this paper we propose to determinate and to test a set of 20 statistical parameters in order to estimate the short term predictability of the global horizontal irradiation time series and thereby to propose a new prospective tool indicating the expected error regardless the forecasting methods used. The mean absolute log return, which is a tool usually used in econometrics but never in global radiation prediction, proves to be a very good estimator. Some examples of the use of this tool are exposed, showing the interest of this statistical parameter in concrete cases of predictions or optimizations. This study gives a judgment for engineers and researchers on the installation or management of solar plants and could help in minimizing the energy crisis allowing to improve the renewable energy part of the energy mix. - Highlights: • Use of statistical parameter never used for the global radiation forecasting. • A priori index allowing to optimize easily and quickly a clear sky model. • New methodology allowing to quantify the prediction error regardless the predictor used. • The prediction error depends more on the location and the time series type than the machine Learning method used.
Frequency weighted model predictive control of wind turbine
DEFF Research Database (Denmark)
Klauco, Martin; Poulsen, Niels Kjølstad; Mirzaei, Mahmood
2013-01-01
This work is focused on applying frequency weighted model predictive control (FMPC) on three blade horizontal axis wind turbine (HAWT). A wind turbine is a very complex, non-linear system influenced by a stochastic wind speed variation. The reduced dynamics considered in this work are the rotatio...... predictive controller are presented. Statistical comparison between frequency weighted MPC, standard MPC and baseline PI controller is shown as well.......This work is focused on applying frequency weighted model predictive control (FMPC) on three blade horizontal axis wind turbine (HAWT). A wind turbine is a very complex, non-linear system influenced by a stochastic wind speed variation. The reduced dynamics considered in this work...... are the rotational degree of freedom of the rotor and the tower for-aft movement. The MPC design is based on a receding horizon policy and a linearised model of the wind turbine. Due to the change of dynamics according to wind speed, several linearisation points must be considered and the control design adjusted...
Aspects of statistical model for multifragmentation
International Nuclear Information System (INIS)
Bhattacharyya, P.; Das Gupta, S.; Mekjian, A. Z.
1999-01-01
We deal with two different aspects of an exactly soluble statistical model of fragmentation. First we show, using zero range force and finite temperature Thomas-Fermi theory, that a common link can be found between finite temperature mean field theory and the statistical fragmentation model. We show the latter naturally arises in the spinodal region. Next we show that although the exact statistical model is a canonical model and uses temperature, microcanonical results which use constant energy rather than constant temperature can also be obtained from the canonical model using saddle-point approximation. The methodology is extremely simple to implement and at least in all the examples studied in this work is very accurate. (c) 1999 The American Physical Society
Statistical Compression for Climate Model Output
Hammerling, D.; Guinness, J.; Soh, Y. J.
2017-12-01
Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.
Gaussian covariance graph models accounting for correlated marker effects in genome-wide prediction.
Martínez, C A; Khare, K; Rahman, S; Elzo, M A
2017-10-01
Several statistical models used in genome-wide prediction assume uncorrelated marker allele substitution effects, but it is known that these effects may be correlated. In statistics, graphical models have been identified as a useful tool for covariance estimation in high-dimensional problems and it is an area that has recently experienced a great expansion. In Gaussian covariance graph models (GCovGM), the joint distribution of a set of random variables is assumed to be Gaussian and the pattern of zeros of the covariance matrix is encoded in terms of an undirected graph G. In this study, methods adapting the theory of GCovGM to genome-wide prediction were developed (Bayes GCov, Bayes GCov-KR and Bayes GCov-H). In simulated data sets, improvements in correlation between phenotypes and predicted breeding values and accuracies of predicted breeding values were found. Our models account for correlation of marker effects and permit to accommodate general structures as opposed to models proposed in previous studies, which consider spatial correlation only. In addition, they allow incorporation of biological information in the prediction process through its use when constructing graph G, and their extension to the multi-allelic loci case is straightforward. © 2017 Blackwell Verlag GmbH.
Automated statistical modeling of analytical measurement systems
International Nuclear Information System (INIS)
Jacobson, J.J.
1992-01-01
The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability
Spatio-temporal statistical models with applications to atmospheric processes
International Nuclear Information System (INIS)
Wikle, C.K.
1996-01-01
This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model
Kaplan, David; Lee, Chansoon
2018-01-01
This article provides a review of Bayesian model averaging as a means of optimizing the predictive performance of common statistical models applied to large-scale educational assessments. The Bayesian framework recognizes that in addition to parameter uncertainty, there is uncertainty in the choice of models themselves. A Bayesian approach to addressing the problem of model uncertainty is the method of Bayesian model averaging. Bayesian model averaging searches the space of possible models for a set of submodels that satisfy certain scientific principles and then averages the coefficients across these submodels weighted by each model's posterior model probability (PMP). Using the weighted coefficients for prediction has been shown to yield optimal predictive performance according to certain scoring rules. We demonstrate the utility of Bayesian model averaging for prediction in education research with three examples: Bayesian regression analysis, Bayesian logistic regression, and a recently developed approach for Bayesian structural equation modeling. In each case, the model-averaged estimates are shown to yield better prediction of the outcome of interest than any submodel based on predictive coverage and the log-score rule. Implications for the design of large-scale assessments when the goal is optimal prediction in a policy context are discussed.
Directory of Open Access Journals (Sweden)
Kyle A McQuisten
2009-10-01
Full Text Available Exogenous short interfering RNAs (siRNAs induce a gene knockdown effect in cells by interacting with naturally occurring RNA processing machinery. However not all siRNAs induce this effect equally. Several heterogeneous kinds of machine learning techniques and feature sets have been applied to modeling siRNAs and their abilities to induce knockdown. There is some growing agreement to which techniques produce maximally predictive models and yet there is little consensus for methods to compare among predictive models. Also, there are few comparative studies that address what the effect of choosing learning technique, feature set or cross validation approach has on finding and discriminating among predictive models.Three learning techniques were used to develop predictive models for effective siRNA sequences including Artificial Neural Networks (ANNs, General Linear Models (GLMs and Support Vector Machines (SVMs. Five feature mapping methods were also used to generate models of siRNA activities. The 2 factors of learning technique and feature mapping were evaluated by complete 3x5 factorial ANOVA. Overall, both learning techniques and feature mapping contributed significantly to the observed variance in predictive models, but to differing degrees for precision and accuracy as well as across different kinds and levels of model cross-validation.The methods presented here provide a robust statistical framework to compare among models developed under distinct learning techniques and feature sets for siRNAs. Further comparisons among current or future modeling approaches should apply these or other suitable statistically equivalent methods to critically evaluate the performance of proposed models. ANN and GLM techniques tend to be more sensitive to the inclusion of noisy features, but the SVM technique is more robust under large numbers of features for measures of model precision and accuracy. Features found to result in maximally predictive models are
Kerr, Kathleen F; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G; Parikh, Chirag R
2014-08-07
The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients' risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. Copyright © 2014 by the American Society of Nephrology.
Statistical modelling for ship propulsion efficiency
DEFF Research Database (Denmark)
Petersen, Jóan Petur; Jacobsen, Daniel J.; Winther, Ole
2012-01-01
This paper presents a state-of-the-art systems approach to statistical modelling of fuel efficiency in ship propulsion, and also a novel and publicly available data set of high quality sensory data. Two statistical model approaches are investigated and compared: artificial neural networks...
A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.
Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He
2018-01-01
The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.
Statistical sampling approaches for soil monitoring
Brus, D.J.
2014-01-01
This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid
Energy Technology Data Exchange (ETDEWEB)
Siljander, M.
2010-07-01
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Aaland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling
He, Yuning; Davies, Misty Dawn
2013-01-01
Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
α -induced reactions on 115In: Cross section measurements and statistical model analysis
Kiss, G. G.; Szücs, T.; Mohr, P.; Török, Zs.; Huszánk, R.; Gyürky, Gy.; Fülöp, Zs.
2018-05-01
Background: α -nucleus optical potentials are basic ingredients of statistical model calculations used in nucleosynthesis simulations. While the nucleon+nucleus optical potential is fairly well known, for the α +nucleus optical potential several different parameter sets exist and large deviations, reaching sometimes even an order of magnitude, are found between the cross section predictions calculated using different parameter sets. Purpose: A measurement of the radiative α -capture and the α -induced reaction cross sections on the nucleus 115In at low energies allows a stringent test of statistical model predictions. Since experimental data are scarce in this mass region, this measurement can be an important input to test the global applicability of α +nucleus optical model potentials and further ingredients of the statistical model. Methods: The reaction cross sections were measured by means of the activation method. The produced activities were determined by off-line detection of the γ rays and characteristic x rays emitted during the electron capture decay of the produced Sb isotopes. The 115In(α ,γ )119Sb and 115In(α ,n )Sb118m reaction cross sections were measured between Ec .m .=8.83 and 15.58 MeV, and the 115In(α ,n )Sb118g reaction was studied between Ec .m .=11.10 and 15.58 MeV. The theoretical analysis was performed within the statistical model. Results: The simultaneous measurement of the (α ,γ ) and (α ,n ) cross sections allowed us to determine a best-fit combination of all parameters for the statistical model. The α +nucleus optical potential is identified as the most important input for the statistical model. The best fit is obtained for the new Atomki-V1 potential, and good reproduction of the experimental data is also achieved for the first version of the Demetriou potentials and the simple McFadden-Satchler potential. The nucleon optical potential, the γ -ray strength function, and the level density parametrization are also
Calibration plots for risk prediction models in the presence of competing risks
DEFF Research Database (Denmark)
Gerds, Thomas A; Andersen, Per K; Kattan, Michael W
2014-01-01
A predicted risk of 17% can be called reliable if it can be expected that the event will occur to about 17 of 100 patients who all received a predicted risk of 17%. Statistical models can predict the absolute risk of an event such as cardiovascular death in the presence of competing risks...... prediction model is well calibrated. The first is lack of independent validation data, the second is right censoring, and the third is that when the risk scale is continuous, the estimation problem is as difficult as density estimation. To deal with these problems, we propose to estimate calibration curves...
Shah, Neomi; Hanna, David B; Teng, Yanping; Sotres-Alvarez, Daniela; Hall, Martica; Loredo, Jose S; Zee, Phyllis; Kim, Mimi; Yaggi, H Klar; Redline, Susan; Kaplan, Robert C
2016-06-01
We developed and validated the first-ever sleep apnea (SA) risk calculator in a large population-based cohort of Hispanic/Latino subjects. Cross-sectional data on adults from the Hispanic Community Health Study/Study of Latinos (2008-2011) were analyzed. Subjective and objective sleep measurements were obtained. Clinically significant SA was defined as an apnea-hypopnea index ≥ 15 events per hour. Using logistic regression, four prediction models were created: three sex-specific models (female-only, male-only, and a sex × covariate interaction model to allow differential predictor effects), and one overall model with sex included as a main effect only. Models underwent 10-fold cross-validation and were assessed by using the C statistic. SA and its predictive variables; a total of 17 variables were considered. A total of 12,158 participants had complete sleep data available; 7,363 (61%) were women. The population-weighted prevalence of SA (apnea-hypopnea index ≥ 15 events per hour) was 6.1% in female subjects and 13.5% in male subjects. Male-only (C statistic, 0.808) and female-only (C statistic, 0.836) prediction models had the same predictor variables (ie, age, BMI, self-reported snoring). The sex-interaction model (C statistic, 0.836) contained sex, age, age × sex, BMI, BMI × sex, and self-reported snoring. The final overall model (C statistic, 0.832) contained age, BMI, snoring, and sex. We developed two websites for our SA risk calculator: one in English (https://www.montefiore.org/sleepapneariskcalc.html) and another in Spanish (http://www.montefiore.org/sleepapneariskcalc-es.html). We created an internally validated, highly discriminating, well-calibrated, and parsimonious prediction model for SA. Contrary to the study hypothesis, the variables did not have different predictive magnitudes in male and female subjects. Copyright © 2016 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.
Sensometrics: Thurstonian and Statistical Models
DEFF Research Database (Denmark)
Christensen, Rune Haubo Bojesen
. sensR is a package for sensory discrimination testing with Thurstonian models and ordinal supports analysis of ordinal data with cumulative link (mixed) models. While sensR is closely connected to the sensometrics field, the ordinal package has developed into a generic statistical package applicable......This thesis is concerned with the development and bridging of Thurstonian and statistical models for sensory discrimination testing as applied in the scientific discipline of sensometrics. In sensory discrimination testing sensory differences between products are detected and quantified by the use...... and sensory discrimination testing in particular in a series of papers by advancing Thurstonian models for a range of sensory discrimination protocols in addition to facilitating their application by providing software for fitting these models. The main focus is on identifying Thurstonian models...
Rivas, Elena; Lang, Raymond; Eddy, Sean R
2012-02-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
Multiple-point statistical prediction on fracture networks at Yucca Mountain
International Nuclear Information System (INIS)
Liu, X.Y; Zhang, C.Y.; Liu, Q.S.; Birkholzer, J.T.
2009-01-01
In many underground nuclear waste repository systems, such as at Yucca Mountain, water flow rate and amount of water seepage into the waste emplacement drifts are mainly determined by hydrological properties of fracture network in the surrounding rock mass. Natural fracture network system is not easy to describe, especially with respect to its connectivity which is critically important for simulating the water flow field. In this paper, we introduced a new method for fracture network description and prediction, termed multi-point-statistics (MPS). The process of the MPS method is to record multiple-point statistics concerning the connectivity patterns of a fracture network from a known fracture map, and to reproduce multiple-scale training fracture patterns in a stochastic manner, implicitly and directly. It is applied to fracture data to study flow field behavior at the Yucca Mountain waste repository system. First, the MPS method is used to create a fracture network with an original fracture training image from Yucca Mountain dataset. After we adopt a harmonic and arithmetic average method to upscale the permeability to a coarse grid, THM simulation is carried out to study near-field water flow in the surrounding waste emplacement drifts. Our study shows that connectivity or patterns of fracture networks can be grasped and reconstructed by MPS methods. In theory, it will lead to better prediction of fracture system characteristics and flow behavior. Meanwhile, we can obtain variance from flow field, which gives us a way to quantify model uncertainty even in complicated coupled THM simulations. It indicates that MPS can potentially characterize and reconstruct natural fracture networks in a fractured rock mass with advantages of quantifying connectivity of fracture system and its simulation uncertainty simultaneously.
International Nuclear Information System (INIS)
Weathers, J.B.; Luck, R.; Weathers, J.W.
2009-01-01
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Energy Technology Data Exchange (ETDEWEB)
Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com
2009-11-15
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
An approach to model validation and model-based prediction -- polyurethane foam case study.
Energy Technology Data Exchange (ETDEWEB)
Dowding, Kevin J.; Rutherford, Brian Milne
2003-07-01
Enhanced software methodology and improved computing hardware have advanced the state of simulation technology to a point where large physics-based codes can be a major contributor in many systems analyses. This shift toward the use of computational methods has brought with it new research challenges in a number of areas including characterization of uncertainty, model validation, and the analysis of computer output. It is these challenges that have motivated the work described in this report. Approaches to and methods for model validation and (model-based) prediction have been developed recently in the engineering, mathematics and statistical literatures. In this report we have provided a fairly detailed account of one approach to model validation and prediction applied to an analysis investigating thermal decomposition of polyurethane foam. A model simulates the evolution of the foam in a high temperature environment as it transforms from a solid to a gas phase. The available modeling and experimental results serve as data for a case study focusing our model validation and prediction developmental efforts on this specific thermal application. We discuss several elements of the ''philosophy'' behind the validation and prediction approach: (1) We view the validation process as an activity applying to the use of a specific computational model for a specific application. We do acknowledge, however, that an important part of the overall development of a computational simulation initiative is the feedback provided to model developers and analysts associated with the application. (2) We utilize information obtained for the calibration of model parameters to estimate the parameters and quantify uncertainty in the estimates. We rely, however, on validation data (or data from similar analyses) to measure the variability that contributes to the uncertainty in predictions for specific systems or units (unit-to-unit variability). (3) We perform statistical
Statistical Modeling of Large-Scale Signal Path Loss in Underwater Acoustic Networks
Directory of Open Access Journals (Sweden)
Manuel Perez Malumbres
2013-02-01
Full Text Available In an underwater acoustic channel, the propagation conditions are known to vary in time, causing the deviation of the received signal strength from the nominal value predicted by a deterministic propagation model. To facilitate a large-scale system design in such conditions (e.g., power allocation, we have developed a statistical propagation model in which the transmission loss is treated as a random variable. By applying repetitive computation to the acoustic field, using ray tracing for a set of varying environmental conditions (surface height, wave activity, small node displacements around nominal locations, etc., an ensemble of transmission losses is compiled and later used to infer the statistical model parameters. A reasonable agreement is found with log-normal distribution, whose mean obeys a log-distance increases, and whose variance appears to be constant for a certain range of inter-node distances in a given deployment location. The statistical model is deemed useful for higher-level system planning, where simulation is needed to assess the performance of candidate network protocols under various resource allocation policies, i.e., to determine the transmit power and bandwidth allocation necessary to achieve a desired level of performance (connectivity, throughput, reliability, etc..
Analytical model of SiPM time resolution and order statistics with crosstalk
International Nuclear Information System (INIS)
Vinogradov, S.
2015-01-01
Time resolution is the most important parameter of photon detectors in a wide range of time-of-flight and time correlation applications within the areas of high energy physics, medical imaging, and others. Silicon photomultipliers (SiPM) have been initially recognized as perfect photon-number-resolving detectors; now they also provide outstanding results in the scintillator timing resolution. However, crosstalk and afterpulsing introduce false secondary non-Poissonian events, and SiPM time resolution models are experiencing significant difficulties with that. This study presents an attempt to develop an analytical model of the timing resolution of an SiPM taking into account statistics of secondary events resulting from a crosstalk. Two approaches have been utilized to derive an analytical expression for time resolution: the first one based on statistics of independent identically distributed detection event times and the second one based on order statistics of these times. The first approach is found to be more straightforward and “analytical-friendly” to model analog SiPMs. Comparisons of coincidence resolving times predicted by the model with the known experimental results from a LYSO:Ce scintillator and a Hamamatsu MPPC are presented
Analytical model of SiPM time resolution and order statistics with crosstalk
Energy Technology Data Exchange (ETDEWEB)
Vinogradov, S., E-mail: Sergey.Vinogradov@liverpool.ac.uk [University of Liverpool and Cockcroft Institute, Sci-Tech Daresbury, Keckwick Lane, Warrington WA4 4AD (United Kingdom); P.N. Lebedev Physical Institute of the Russian Academy of Sciences, 119991 Leninskiy Prospekt 53, Moscow (Russian Federation)
2015-07-01
Time resolution is the most important parameter of photon detectors in a wide range of time-of-flight and time correlation applications within the areas of high energy physics, medical imaging, and others. Silicon photomultipliers (SiPM) have been initially recognized as perfect photon-number-resolving detectors; now they also provide outstanding results in the scintillator timing resolution. However, crosstalk and afterpulsing introduce false secondary non-Poissonian events, and SiPM time resolution models are experiencing significant difficulties with that. This study presents an attempt to develop an analytical model of the timing resolution of an SiPM taking into account statistics of secondary events resulting from a crosstalk. Two approaches have been utilized to derive an analytical expression for time resolution: the first one based on statistics of independent identically distributed detection event times and the second one based on order statistics of these times. The first approach is found to be more straightforward and “analytical-friendly” to model analog SiPMs. Comparisons of coincidence resolving times predicted by the model with the known experimental results from a LYSO:Ce scintillator and a Hamamatsu MPPC are presented.
DEFF Research Database (Denmark)
Ranaboldo, Matteo; Giebel, Gregor; Codina, Bernat
2013-01-01
A combination of physical and statistical treatments to post‐process numerical weather predictions (NWP) outputs is needed for successful short‐term wind power forecasts. One of the most promising and effective approaches for statistical treatment is the Model Output Statistics (MOS) technique....... The proposed MOS performed well in both wind farms, and its forecasts compare positively with an actual operative model in use at Risø DTU and other MOS types, showing minimum BIAS and improving NWP power forecast of around 15% in terms of root mean square error. Further improvements could be obtained...
Statistical model of exotic rotational correlations in emergent space-time
Energy Technology Data Exchange (ETDEWEB)
Hogan, Craig; Kwon, Ohkyung; Richardson, Jonathan
2017-06-06
A statistical model is formulated to compute exotic rotational correlations that arise as inertial frames and causal structure emerge on large scales from entangled Planck scale quantum systems. Noncommutative quantum dynamics are represented by random transverse displacements that respect causal symmetry. Entanglement is represented by covariance of these displacements in Planck scale intervals defined by future null cones of events on an observer's world line. Light that propagates in a nonradial direction inherits a projected component of the exotic rotational correlation that accumulates as a random walk in phase. A calculation of the projection and accumulation leads to exact predictions for statistical properties of exotic Planck scale correlations in an interferometer of any configuration. The cross-covariance for two nearly co-located interferometers is shown to depart only slightly from the autocovariance. Specific examples are computed for configurations that approximate realistic experiments, and show that the model can be rigorously tested.
Statistical modelling for social researchers principles and practice
Tarling, Roger
2008-01-01
This book explains the principles and theory of statistical modelling in an intelligible way for the non-mathematical social scientist looking to apply statistical modelling techniques in research. The book also serves as an introduction for those wishing to develop more detailed knowledge and skills in statistical modelling. Rather than present a limited number of statistical models in great depth, the aim is to provide a comprehensive overview of the statistical models currently adopted in social research, in order that the researcher can make appropriate choices and select the most suitable model for the research question to be addressed. To facilitate application, the book also offers practical guidance and instruction in fitting models using SPSS and Stata, the most popular statistical computer software which is available to most social researchers. Instruction in using MLwiN is also given. Models covered in the book include; multiple regression, binary, multinomial and ordered logistic regression, log-l...
Vaginal birth after caesarean section prediction models: a UK comparative observational study.
Mone, Fionnuala; Harrity, Conor; Mackie, Adam; Segurado, Ricardo; Toner, Brenda; McCormick, Timothy R; Currie, Aoife; McAuliffe, Fionnuala M
2015-10-01
Primarily, to assess the performance of three statistical models in predicting successful vaginal birth in patients attempting a trial of labour after one previous lower segment caesarean section (TOLAC). The statistically most reliable models were subsequently subjected to validation testing in a local antenatal population. A retrospective observational study was performed with study data collected from the Northern Ireland Maternity Service Database (NIMATs). The study population included all women that underwent a TOLAC (n=385) from 2010 to 2012 in a regional UK obstetric unit. Data was collected from the Northern Ireland Maternity Service Database (NIMATs). Area under the curve (AUC) and correlation analysis was performed. Of the three prediction models evaluated, AUC calculations for the Smith et al., Grobman et al. and Troyer and Parisi Models were 0.74, 0.72 and 0.65, respectively. Using the Smith et al. model, 52% of women had a low risk of caesarean section (CS) (predicted VBAC >72%) and 20% had a high risk of CS (predicted VBAC <60%), of whom 20% and 63% had delivery by CS. The fit between observed and predicted outcome in this study cohort using the Smith et al. and Grobman et al. models were greatest (Chi-square test, p=0.228 and 0.904), validating both within the population. The Smith et al. and Grobman et al. models could potentially be utilized within the UK to provide women with an informed choice when deciding on mode of delivery after a previous CS. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.
Topology for statistical modeling of petascale data.
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Automated parameter estimation for biological models using Bayesian statistical model checking.
Hussain, Faraz; Langmead, Christopher J; Mi, Qi; Dutta-Moscato, Joyeeta; Vodovotz, Yoram; Jha, Sumit K
2015-01-01
Probabilistic models have gained widespread acceptance in the systems biology community as a useful way to represent complex biological systems. Such models are developed using existing knowledge of the structure and dynamics of the system, experimental observations, and inferences drawn from statistical analysis of empirical data. A key bottleneck in building such models is that some system variables cannot be measured experimentally. These variables are incorporated into the model as numerical parameters. Determining values of these parameters that justify existing experiments and provide reliable predictions when model simulations are performed is a key research problem. Using an agent-based model of the dynamics of acute inflammation, we demonstrate a novel parameter estimation algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide that guarantee a set of observed clinical outcomes with high probability. We synthesized values of twenty-eight unknown parameters such that the parameterized model instantiated with these parameter values satisfies four specifications describing the dynamic behavior of the model. We have developed a new algorithmic technique for discovering parameters in complex stochastic models of biological systems given behavioral specifications written in a formal mathematical logic. Our algorithm uses Bayesian model checking, sequential hypothesis testing, and stochastic optimization to automatically synthesize parameters of probabilistic biological models.
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Multi-fidelity machine learning models for accurate bandgap predictions of solids
International Nuclear Information System (INIS)
Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab
2016-01-01
Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelity quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.
Statistical Model-Based Face Pose Estimation
Institute of Scientific and Technical Information of China (English)
GE Xinliang; YANG Jie; LI Feng; WANG Huahua
2007-01-01
A robust face pose estimation approach is proposed by using face shape statistical model approach and pose parameters are represented by trigonometric functions. The face shape statistical model is firstly built by analyzing the face shapes from different people under varying poses. The shape alignment is vital in the process of building the statistical model. Then, six trigonometric functions are employed to represent the face pose parameters. Lastly, the mapping function is constructed between face image and face pose by linearly relating different parameters. The proposed approach is able to estimate different face poses using a few face training samples. Experimental results are provided to demonstrate its efficiency and accuracy.
Tornadoes and related damage costs: statistical modelling with a semi-Markov approach
Directory of Open Access Journals (Sweden)
Guglielmo D’Amico
2016-09-01
Full Text Available We propose a statistical approach to modelling for predicting and simulating occurrences of tornadoes and accumulated cost distributions over a time interval. This is achieved by modelling the tornado intensity, measured with the Fujita scale, as a stochastic process. Since the Fujita scale divides tornado intensity into six states, it is possible to model the tornado intensity by using Markov and semi-Markov models. We demonstrate that the semi-Markov approach is able to reproduce the duration effect that is detected in tornado occurrence. The superiority of the semi-Markov model as compared to the Markov chain model is also affirmed by means of a statistical test of hypothesis. As an application, we compute the expected value and the variance of the costs generated by the tornadoes over a given time interval in a given area. The paper contributes to the literature by demonstrating that semi-Markov models represent an effective tool for physical analysis of tornadoes as well as for the estimation of the economic damages to human things.
McCormick, Keith; Wei, Bowen
2017-01-01
IBM SPSS Modeler allows quick, efficient predictive analytics and insight building from your data, and is a popularly used data mining tool. This book will guide you through the data mining process, and presents relevant statistical methods which are used to build predictive models and conduct other analytic tasks using IBM SPSS Modeler. From ...
International Nuclear Information System (INIS)
Grotch, S.L.
1991-01-01
This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections. 11 refs.; 10 figs
International Nuclear Information System (INIS)
Grotch, S.L.
1990-01-01
This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections
Application of a novel cellular automaton porosity prediction model to aluminium castings
International Nuclear Information System (INIS)
Atwood, R.C.; Chirazi, A.; Lee, P.D.
2002-01-01
A multiscale model was developed to predict the formation of porosity within a solidifying aluminium-silicon alloy. The diffusion of silicon and dissolved gas was simulated on a microscopic scale combined with cellular automaton models of gas porosity formation within the growing three-dimensional solidification microstructure. However, due to high computational cost, the modelled volume is limited to the millimetre range. This renders the application of direct modelling of complex shape castings unfeasible. Combining the microstructural modelling with a statistical response-surface prediction method allows application of the microstructural model results to industrial scale casts by incorporating them in commercial solidification software. (author)
Development and Validation of a Prediction Model to Estimate Individual Risk of Pancreatic Cancer.
Yu, Ami; Woo, Sang Myung; Joo, Jungnam; Yang, Hye-Ryung; Lee, Woo Jin; Park, Sang-Jae; Nam, Byung-Ho
2016-01-01
There is no reliable screening tool to identify people with high risk of developing pancreatic cancer even though pancreatic cancer represents the fifth-leading cause of cancer-related death in Korea. The goal of this study was to develop an individualized risk prediction model that can be used to screen for asymptomatic pancreatic cancer in Korean men and women. Gender-specific risk prediction models for pancreatic cancer were developed using the Cox proportional hazards model based on an 8-year follow-up of a cohort study of 1,289,933 men and 557,701 women in Korea who had biennial examinations in 1996-1997. The performance of the models was evaluated with respect to their discrimination and calibration ability based on the C-statistic and Hosmer-Lemeshow type χ2 statistic. A total of 1,634 (0.13%) men and 561 (0.10%) women were newly diagnosed with pancreatic cancer. Age, height, BMI, fasting glucose, urine glucose, smoking, and age at smoking initiation were included in the risk prediction model for men. Height, BMI, fasting glucose, urine glucose, smoking, and drinking habit were included in the risk prediction model for women. Smoking was the most significant risk factor for developing pancreatic cancer in both men and women. The risk prediction model exhibited good discrimination and calibration ability, and in external validation it had excellent prediction ability. Gender-specific risk prediction models for pancreatic cancer were developed and validated for the first time. The prediction models will be a useful tool for detecting high-risk individuals who may benefit from increased surveillance for pancreatic cancer.
Identified state-space prediction model for aero-optical wavefronts
Faghihi, Azin; Tesch, Jonathan; Gibson, Steve
2013-07-01
A state-space disturbance model and associated prediction filter for aero-optical wavefronts are described. The model is computed by system identification from a sequence of wavefronts measured in an airborne laboratory. Estimates of the statistics and flow velocity of the wavefront data are shown and can be computed from the matrices in the state-space model without returning to the original data. Numerical results compare velocity values and power spectra computed from the identified state-space model with those computed from the aero-optical data.
Prediction of noise in ships by the application of “statistical energy analysis.”
DEFF Research Database (Denmark)
Jensen, John Ødegaard
1979-01-01
If it will be possible effectively to reduce the noise level in the accomodation on board ships, by introducing appropriate noise abatement measures already at an early design stage, it is quite essential that sufficiently accurate prediction methods are available for the naval architects...... or for a special noise abatement measure, e.g., increased structural damping. The paper discusses whether it might be possible to derive an alternative calculation model based on the “statistical energy analysis” approach (SEA). By considering the hull of a ship to be constructed from plate elements connected...
Relevance of the c-statistic when evaluating risk-adjustment models in surgery.
Merkow, Ryan P; Hall, Bruce L; Cohen, Mark E; Dimick, Justin B; Wang, Edward; Chow, Warren B; Ko, Clifford Y; Bilimoria, Karl Y
2012-05-01
The measurement of hospital quality based on outcomes requires risk adjustment. The c-statistic is a popular tool used to judge model performance, but can be limited, particularly when evaluating specific operations in focused populations. Our objectives were to examine the interpretation and relevance of the c-statistic when used in models with increasingly similar case mix and to consider an alternative perspective on model calibration based on a graphical depiction of model fit. From the American College of Surgeons National Surgical Quality Improvement Program (2008-2009), patients were identified who underwent a general surgery procedure, and procedure groups were increasingly restricted: colorectal-all, colorectal-elective cases only, and colorectal-elective cancer cases only. Mortality and serious morbidity outcomes were evaluated using logistic regression-based risk adjustment, and model c-statistics and calibration curves were used to compare model performance. During the study period, 323,427 general, 47,605 colorectal-all, 39,860 colorectal-elective, and 21,680 colorectal cancer patients were studied. Mortality ranged from 1.0% in general surgery to 4.1% in the colorectal-all group, and serious morbidity ranged from 3.9% in general surgery to 12.4% in the colorectal-all procedural group. As case mix was restricted, c-statistics progressively declined from the general to the colorectal cancer surgery cohorts for both mortality and serious morbidity (mortality: 0.949 to 0.866; serious morbidity: 0.861 to 0.668). Calibration was evaluated graphically by examining predicted vs observed number of events over risk deciles. For both mortality and serious morbidity, there was no qualitative difference in calibration identified between the procedure groups. In the present study, we demonstrate how the c-statistic can become less informative and, in certain circumstances, can lead to incorrect model-based conclusions, as case mix is restricted and patients become
Impact of modellers' decisions on hydrological a priori predictions
Holländer, H. M.; Bormann, H.; Blume, T.; Buytaert, W.; Chirico, G. B.; Exbrayat, J.-F.; Gustafsson, D.; Hölzel, H.; Krauße, T.; Kraft, P.; Stoll, S.; Blöschl, G.; Flühler, H.
2014-06-01
added information. In this qualitative analysis of a statistically small number of predictions we learned (i) that soft information such as the modeller's system understanding is as important as the model itself (hard information), (ii) that the sequence of modelling steps matters (field visit, interactions between differently experienced experts, choice of model, selection of available data, and methods for parameter guessing), and (iii) that added process understanding can be as efficient as adding data for improving parameters needed to satisfy model requirements.
Statistical shape modeling based renal volume measurement using tracked ultrasound
Pai Raikar, Vipul; Kwartowitz, David M.
2017-03-01
Autosomal dominant polycystic kidney disease (ADPKD) is the fourth most common cause of kidney transplant worldwide accounting for 7-10% of all cases. Although ADPKD usually progresses over many decades, accurate risk prediction is an important task.1 Identifying patients with progressive disease is vital to providing new treatments being developed and enable them to enter clinical trials for new therapy. Among other factors, total kidney volume (TKV) is a major biomarker predicting the progression of ADPKD. Consortium for Radiologic Imaging Studies in Polycystic Kidney Disease (CRISP)2 have shown that TKV is an early, and accurate measure of cystic burden and likely growth rate. It is strongly associated with loss of renal function.3 While ultrasound (US) has proven as an excellent tool for diagnosing the disease; monitoring short-term changes using ultrasound has been shown to not be accurate. This is attributed to high operator variability and reproducibility as compared to tomographic modalities such as CT and MR (Gold standard). Ultrasound has emerged as one of the standout modality for intra-procedural imaging and with methods for spatial localization has afforded us the ability to track 2D ultrasound in physical space which it is being used. In addition to this, the vast amount of recorded tomographic data can be used to generate statistical shape models that allow us to extract clinical value from archived image sets. In this work, we aim at improving the prognostic value of US in managing ADPKD by assessing the accuracy of using statistical shape model augmented US data, to predict TKV, with the end goal of monitoring short-term changes.
Predicting Power Outages Using Multi-Model Ensemble Forecasts
Cerrai, D.; Anagnostou, E. N.; Yang, J.; Astitha, M.
2017-12-01
Power outages affect every year millions of people in the United States, affecting the economy and conditioning the everyday life. An Outage Prediction Model (OPM) has been developed at the University of Connecticut for helping utilities to quickly restore outages and to limit their adverse consequences on the population. The OPM, operational since 2015, combines several non-parametric machine learning (ML) models that use historical weather storm simulations and high-resolution weather forecasts, satellite remote sensing data, and infrastructure and land cover data to predict the number and spatial distribution of power outages. A new methodology, developed for improving the outage model performances by combining weather- and soil-related variables using three different weather models (WRF 3.7, WRF 3.8 and RAMS/ICLAMS), will be presented in this study. First, we will present a performance evaluation of each model variable, by comparing historical weather analyses with station data or reanalysis over the entire storm data set. Hence, each variable of the new outage model version is extracted from the best performing weather model for that variable, and sensitivity tests are performed for investigating the most efficient variable combination for outage prediction purposes. Despite that the final variables combination is extracted from different weather models, this ensemble based on multi-weather forcing and multi-statistical model power outage prediction outperforms the currently operational OPM version that is based on a single weather forcing variable (WRF 3.7), because each model component is the closest to the actual atmospheric state.
Prediction of Hydrologic Characteristics for Ungauged Catchments to Support Hydroecological Modeling
Bond, Nick R.; Kennard, Mark J.
2017-11-01
Hydrologic variability is a fundamental driver of ecological processes and species distribution patterns within river systems, yet the paucity of gauges in many catchments means that streamflow data are often unavailable for ecological survey sites. Filling this data gap is an important challenge in hydroecological research. To address this gap, we first test the ability to spatially extrapolate hydrologic metrics calculated from gauged streamflow data to ungauged sites as a function of stream distance and catchment area. Second, we examine the ability of statistical models to predict flow regime metrics based on climate and catchment physiographic variables. Our assessment focused on Australia's largest catchment, the Murray-Darling Basin (MDB). We found that hydrologic metrics were predictable only between sites within ˜25 km of one another. Beyond this, correlations between sites declined quickly. We found less than 40% of fish survey sites from a recent basin-wide monitoring program (n = 777 sites) to fall within this 25 km range, thereby greatly limiting the ability to utilize gauge data for direct spatial transposition of hydrologic metrics to biological survey sites. In contrast, statistical model-based transposition proved effective in predicting ecologically relevant aspects of the flow regime (including metrics describing central tendency, high- and low-flows intermittency, seasonality, and variability) across the entire gauge network (median R2 ˜ 0.54, range 0.39-0.94). Modeled hydrologic metrics thus offer a useful alternative to empirical data when examining biological survey data from ungauged sites. More widespread use of these statistical tools and modeled metrics could expand our understanding of flow-ecology relationships.
Woo, Yanghee; Goldner, Bryan; Son, Taeil; Song, Kijun; Noh, Sung Hoon; Fong, Yuman; Hyung, Woo Jin
2018-03-01
A novel prediction model for accurate determination of 5-year overall survival of gastric cancer patients was developed by an international collaborative group (G6+). This prediction model was created using a single institution's database of 11,851 Korean patients and included readily available and clinically relevant factors. Already validated using external East Asian cohorts, its applicability in the American population was yet to be determined. Using the Surveillance, Epidemiology, and End Results (SEER) dataset, 2014 release, all patients diagnosed with gastric adenocarcinoma who underwent surgical resection between 2002 and 2012, were selected. Characteristics for analysis included: age, sex, depth of tumor invasion, number of positive lymph nodes, total lymph nodes retrieved, presence of distant metastasis, extent of resection, and histology. Concordance index (C-statistic) was assessed using the novel prediction model and compared with the prognostic index, the seventh edition of the TNM staging system. Of the 26,019 gastric cancer patients identified from the SEER database, 15,483 had complete datasets. Validation of the novel prediction tool revealed a C-statistic of 0.762 (95% CI 0.754 to 0.769) compared with the seventh TNM staging model, C-statistic 0.683 (95% CI 0.677 to 0.689), (p prediction model for gastric cancer in the American patient population. Its superior prediction of the 5-year survival of gastric cancer patients in a large Western cohort strongly supports its global applicability. Importantly, this model allows for accurate prognosis for an increasing number of gastric cancer patients worldwide, including those who received inadequate lymphadenectomy or underwent a noncurative resection. Copyright © 2017 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Predictive Modelling and Time: An Experiment in Temporal Archaeological Predictive Models
David Ebert
2006-01-01
One of the most common criticisms of archaeological predictive modelling is that it fails to account for temporal or functional differences in sites. However, a practical solution to temporal or functional predictive modelling has proven to be elusive. This article discusses temporal predictive modelling, focusing on the difficulties of employing temporal variables, then introduces and tests a simple methodology for the implementation of temporal modelling. The temporal models thus created ar...
A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress
2018-01-01
The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399
A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress
Directory of Open Access Journals (Sweden)
Ching-Hsue Cheng
2018-01-01
Full Text Available The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i the proposed model is different from the previous models lacking the concept of time series; (ii the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.
Matrix Tricks for Linear Statistical Models
Puntanen, Simo; Styan, George PH
2011-01-01
In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and
Statistical Model Checking of Rich Models and Properties
DEFF Research Database (Denmark)
Poulsen, Danny Bøgsted
in undecidability issues for the traditional model checking approaches. Statistical model checking has proven itself a valuable supplement to model checking and this thesis is concerned with extending this software validation technique to stochastic hybrid systems. The thesis consists of two parts: the first part...... motivates why existing model checking technology should be supplemented by new techniques. It also contains a brief introduction to probability theory and concepts covered by the six papers making up the second part. The first two papers are concerned with developing online monitoring techniques...... systems. The fifth paper shows how stochastic hybrid automata are useful for modelling biological systems and the final paper is concerned with showing how statistical model checking is efficiently distributed. In parallel with developing the theory contained in the papers, a substantial part of this work...
International Nuclear Information System (INIS)
Yokoyama, Masayuki
2014-01-01
A statistical approach is proposed to predict thermal diffusivity profiles as a transport “model” in fusion plasmas. It can provide regression expressions for the ion and electron heat diffusivities (χ i and χ e ), separately, to construct their radial profiles. An approach that this letter is proposing outstrips the conventional scaling laws for the global confinement time (τ E ) since it also deals with profiles (temperature, density, heating depositions etc.). This approach has become possible with the analysis database accumulated by the extensive application of the integrated transport analysis suite to experiment data. In this letter, TASK3D-a analysis database for high-ion-temperature (high-T i ) plasmas in the LHD (Large Helical Device) is used as an example to describe an approach. (author)
Machine learning modelling for predicting soil liquefaction susceptibility
Directory of Open Access Journals (Sweden)
P. Samui
2011-01-01
Full Text Available This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN based on multi-layer perceptions (MLP that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N_{1}_{60}] and cyclic stress ratio (CSR. Further, an attempt has been made to simplify the models, requiring only the two parameters [(N_{1}_{60} and peck ground acceleration (a_{max}/g], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.
Directory of Open Access Journals (Sweden)
Demosthenes B Panagiotakos
2006-09-01
Full Text Available Demosthenes B Panagiotakos, Vassilis StavrinosOffice of Biostatistics, Epidemiology, Department of Dietetics, Nutrition, Harokopio University, Athens, GreeceAbstract: During the past years there has been increasing interest in the development of cardiovascular disease functions that predict future events at individual level. However, this effort has not been so far very successful, since several investigators have reported large differences in the estimation of the absolute risk among different populations. For example, it seems that predictive models that have been derived from US or north European populations overestimate the incidence of cardiovascular events in south European and Japanese populations. A potential explanation could be attributed to several factors such as geographical, cultural, social, behavioral, as well as genetic variations between the investigated populations in addition to various methodological, statistical, issues relating to the estimation of these predictive models. Based on current literature it can be concluded that, while risk prediction of future cardiovascular events is a useful tool and might be valuable in controlling the burden of the disease in a population, further work is required to improve the accuracy of the present predictive models.Keywords: cardiovascular disease, risk, models
Models for predicting objective function weights in prostate cancer IMRT
International Nuclear Information System (INIS)
Boutilier, Justin J.; Lee, Taewoo; Craig, Tim; Sharpe, Michael B.; Chan, Timothy C. Y.
2015-01-01
Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and applied three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR
Models for predicting objective function weights in prostate cancer IMRT
Energy Technology Data Exchange (ETDEWEB)
Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo [Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario M5S 3G8 (Canada); Craig, Tim [Radiation Medicine Program, UHN Princess Margaret Cancer Centre, 610 University of Avenue, Toronto, Ontario M5T 2M9, Canada and Department of Radiation Oncology, University of Toronto, 148 - 150 College Street, Toronto, Ontario M5S 3S2 (Canada); Sharpe, Michael B. [Radiation Medicine Program, UHN Princess Margaret Cancer Centre, 610 University of Avenue, Toronto, Ontario M5T 2M9 (Canada); Department of Radiation Oncology, University of Toronto, 148 - 150 College Street, Toronto, Ontario M5S 3S2 (Canada); Techna Institute for the Advancement of Technology for Health, 124 - 100 College Street, Toronto, Ontario M5G 1P5 (Canada); Chan, Timothy C. Y. [Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario M5S 3G8, Canada and Techna Institute for the Advancement of Technology for Health, 124 - 100 College Street, Toronto, Ontario M5G 1P5 (Canada)
2015-04-15
Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and applied three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR
A stepwise model to predict monthly streamflow
Mahmood Al-Juboori, Anas; Guven, Aytac
2016-12-01
In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.
Energy Technology Data Exchange (ETDEWEB)
Bard, D.; Chang, C.; Kahn, S. M.; Gilmore, K.; Marshall, S. [KIPAC, Stanford University, 452 Lomita Mall, Stanford, CA 94309 (United States); Kratochvil, J. M.; Huffenberger, K. M. [Department of Physics, University of Miami, Coral Gables, FL 33124 (United States); May, M. [Physics Department, Brookhaven National Laboratory, Upton, NY 11973 (United States); AlSayyad, Y.; Connolly, A.; Gibson, R. R.; Jones, L.; Krughoff, S. [Department of Astronomy, University of Washington, Seattle, WA 98195 (United States); Ahmad, Z.; Bankert, J.; Grace, E.; Hannel, M.; Lorenz, S. [Department of Physics, Purdue University, West Lafayette, IN 47907 (United States); Haiman, Z.; Jernigan, J. G., E-mail: djbard@slac.stanford.edu [Department of Astronomy and Astrophysics, Columbia University, New York, NY 10027 (United States); and others
2013-09-01
We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST Image Simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.
Statistical physics of pairwise probability models
DEFF Research Database (Denmark)
Roudi, Yasser; Aurell, Erik; Hertz, John
2009-01-01
(dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data......: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying...
Cai, Y.
2017-12-01
Accurately forecasting crop yields has broad implications for economic trading, food production monitoring, and global food security. However, the variation of environmental variables presents challenges to model yields accurately, especially when the lack of highly accurate measurements creates difficulties in creating models that can succeed across space and time. In 2016, we developed a sequence of machine-learning based models forecasting end-of-season corn yields for the US at both the county and national levels. We combined machine learning algorithms in a hierarchical way, and used an understanding of physiological processes in temporal feature selection, to achieve high precision in our intra-season forecasts, including in very anomalous seasons. During the live run, we predicted the national corn yield within 1.40% of the final USDA number as early as August. In the backtesting of the 2000-2015 period, our model predicts national yield within 2.69% of the actual yield on average already by mid-August. At the county level, our model predicts 77% of the variation in final yield using data through the beginning of August and improves to 80% by the beginning of October, with the percentage of counties predicted within 10% of the average yield increasing from 68% to 73%. Further, the lowest errors are in the most significant producing regions, resulting in very high precision national-level forecasts. In addition, we identify the changes of important variables throughout the season, specifically early-season land surface temperature, and mid-season land surface temperature and vegetation index. For the 2017 season, we feed 2016 data to the training set, together with additional geospatial data sources, aiming to make the current model even more precise. We will show how our 2017 US corn yield forecasts converges in time, which factors affect the yield the most, as well as present our plans for 2018 model adjustments.
Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P
2015-05-01
The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
McKenna-Lawlor, S.M.P. [National Univ. of Ireland, Maynooth, Co. Kildare (Ireland). Space Technology Ireland; Fry, C.D. [Exploration Physics International, Inc., Huntsville, AL (United States); Dryer, M. [Exploration Physics International, Inc., Huntsville, AL (United States); NOAA Space Environment Center, Boulder, CO (United States); Heynderickx, D. [D-H Consultancy, Leuven (Belgium); Kecskemety, K. [KFKI Research Institute for Particle and Nuclear Physics, Budapest (Hungary); Kudela, K. [Institute of Experimental Physics, Kosice (Slovakia); Balaz, J. [National Univ. of Ireland, Maynooth, Co. Kildare (Ireland). Space Technology Ireland; Institute of Experimental Physics, Kosice (Slovakia)
2012-07-01
The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2) numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay) of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events) associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50 %). This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50 %, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter ''Probability of Detection, yes'' (PODy) which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed), yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The
Directory of Open Access Journals (Sweden)
S. M. P. McKenna-Lawlor
2012-02-01
Full Text Available The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2 numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50%. This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50%, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter "Probability of Detection, yes" (PODy which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed, yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The statistical
Optimal model-free prediction from multivariate time series
Runge, Jakob; Donner, Reik V.; Kurths, Jürgen
2015-05-01
Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal preselection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used suboptimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Niño Southern Oscillation.
Smith, Lauren N; Makam, Anil N; Darden, Douglas; Mayo, Helen; Das, Sandeep R; Halm, Ethan A; Nguyen, Oanh Kieu
2018-01-01
Hospitals are subject to federal financial penalties for excessive 30-day hospital readmissions for acute myocardial infarction (AMI). Prospectively identifying patients hospitalized with AMI at high risk for readmission could help prevent 30-day readmissions by enabling targeted interventions. However, the performance of AMI-specific readmission risk prediction models is unknown. We systematically searched the published literature through March 2017 for studies of risk prediction models for 30-day hospital readmission among adults with AMI. We identified 11 studies of 18 unique risk prediction models across diverse settings primarily in the United States, of which 16 models were specific to AMI. The median overall observed all-cause 30-day readmission rate across studies was 16.3% (range, 10.6%-21.0%). Six models were based on administrative data; 4 on electronic health record data; 3 on clinical hospital data; and 5 on cardiac registry data. Models included 7 to 37 predictors, of which demographics, comorbidities, and utilization metrics were the most frequently included domains. Most models, including the Centers for Medicare and Medicaid Services AMI administrative model, had modest discrimination (median C statistic, 0.65; range, 0.53-0.79). Of the 16 reported AMI-specific models, only 8 models were assessed in a validation cohort, limiting generalizability. Observed risk-stratified readmission rates ranged from 3.0% among the lowest-risk individuals to 43.0% among the highest-risk individuals, suggesting good risk stratification across all models. Current AMI-specific readmission risk prediction models have modest predictive ability and uncertain generalizability given methodological limitations. No existing models provide actionable information in real time to enable early identification and risk-stratification of patients with AMI before hospital discharge, a functionality needed to optimize the potential effectiveness of readmission reduction interventions
Najaf, Pooya; Duddu, Venkata R; Pulugurtha, Srinivas S
2018-03-01
Machine learning (ML) techniques have higher prediction accuracy compared to conventional statistical methods for crash frequency modelling. However, their black-box nature limits the interpretability. The objective of this research is to combine both ML and statistical methods to develop hybrid link-level crash frequency models with high predictability and interpretability. For this purpose, M5' model trees method (M5') is introduced and applied to classify the crash data and then calibrate a model for each homogenous class. The data for 1134 and 345 randomly selected links on urban arterials in the city of Charlotte, North Carolina was used to develop and validate models, respectively. The outputs from the hybrid approach are compared with the outputs from cluster-based negative binomial regression (NBR) and general NBR models. Findings indicate that M5' has high predictability and is very reliable to interpret the role of different attributes on crash frequency compared to other developed models.
Ambler, Graeme K; Gohel, Manjit S; Mitchell, David C; Loftus, Ian M; Boyle, Jonathan R
2015-01-01
Accurate adjustment of surgical outcome data for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple imputation methodology together with stepwise model selection to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
A generic statistical methodology to predict the maximum pit depth of a localized corrosion process
International Nuclear Information System (INIS)
Jarrah, A.; Bigerelle, M.; Guillemot, G.; Najjar, D.; Iost, A.; Nianga, J.-M.
2011-01-01
Highlights: → We propose a methodology to predict the maximum pit depth in a corrosion process. → Generalized Lambda Distribution and the Computer Based Bootstrap Method are combined. → GLD fit a large variety of distributions both in their central and tail regions. → Minimum thickness preventing perforation can be estimated with a safety margin. → Considering its applications, this new approach can help to size industrial pieces. - Abstract: This paper outlines a new methodology to predict accurately the maximum pit depth related to a localized corrosion process. It combines two statistical methods: the Generalized Lambda Distribution (GLD), to determine a model of distribution fitting with the experimental frequency distribution of depths, and the Computer Based Bootstrap Method (CBBM), to generate simulated distributions equivalent to the experimental one. In comparison with conventionally established statistical methods that are restricted to the use of inferred distributions constrained by specific mathematical assumptions, the major advantage of the methodology presented in this paper is that both the GLD and the CBBM enable a statistical treatment of the experimental data without making any preconceived choice neither on the unknown theoretical parent underlying distribution of pit depth which characterizes the global corrosion phenomenon nor on the unknown associated theoretical extreme value distribution which characterizes the deepest pits. Considering an experimental distribution of depths of pits produced on an aluminium sample, estimations of maximum pit depth using a GLD model are compared to similar estimations based on usual Gumbel and Generalized Extreme Value (GEV) methods proposed in the corrosion engineering literature. The GLD approach is shown having smaller bias and dispersion in the estimation of the maximum pit depth than the Gumbel approach both for its realization and mean. This leads to comparing the GLD approach to the GEV one
Overview of statistical methods, models and analysis for predicting equipment end of life
Energy Technology Data Exchange (ETDEWEB)
NONE
2009-07-01
Utility equipment can be operated and maintained for many years following installation. However, as the equipment ages, utility operators must decide whether to extend the service life or replace the equipment. Condition assessment modelling is used by many utilities to determine the condition of equipment and to prioritize the maintenance or repair. Several factors are weighted and combined in assessment modelling, which gives a single index number to rate the equipment. There is speculation that this index alone may not be adequate for a business case to rework or replace an asset because it only ranks an asset into a particular category. For that reason, a new methodology was developed to determine the economic end of life of an asset. This paper described the different statistical methods available and their use in determining the remaining service life of electrical equipment. A newly developed Excel-based demonstration computer tool is also an integral part of the deliverables of this project.
Uncertainty the soul of modeling, probability & statistics
Briggs, William
2016-01-01
This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...
Use of predictive models and rapid methods to nowcast bacteria levels at coastal beaches
Francy, Donna S.
2009-01-01
The need for rapid assessments of recreational water quality to better protect public health is well accepted throughout the research and regulatory communities. Rapid analytical methods, such as quantitative polymerase chain reaction (qPCR) and immunomagnetic separation/adenosine triphosphate (ATP) analysis, are being tested but are not yet ready for widespread use.Another solution is the use of predictive models, wherein variable(s) that are easily and quickly measured are surrogates for concentrations of fecal-indicator bacteria. Rainfall-based alerts, the simplest type of model, have been used by several communities for a number of years. Deterministic models use mathematical representations of the processes that affect bacteria concentrations; this type of model is being used for beach-closure decisions at one location in the USA. Multivariable statistical models are being developed and tested in many areas of the USA; however, they are only used in three areas of the Great Lakes to aid in notifications of beach advisories or closings. These “operational” statistical models can result in more accurate assessments of recreational water quality than use of the previous day's Escherichia coli (E. coli)concentration as determined by traditional culture methods. The Ohio Nowcast, at Huntington Beach, Bay Village, Ohio, is described in this paper as an example of an operational statistical model. Because predictive modeling is a dynamic process, water-resource managers continue to collect additional data to improve the predictive ability of the nowcast and expand the nowcast to other Ohio beaches and a recreational river. Although predictive models have been shown to work well at some beaches and are becoming more widely accepted, implementation in many areas is limited by funding, lack of coordinated technical leadership, and lack of supporting epidemiological data.
Complex data modeling and computationally intensive methods for estimation and prediction
Secchi, Piercesare; Advances in Complex Data Modeling and Computational Methods in Statistics
2015-01-01
The book is addressed to statisticians working at the forefront of the statistical analysis of complex and high dimensional data and offers a wide variety of statistical models, computer intensive methods and applications: network inference from the analysis of high dimensional data; new developments for bootstrapping complex data; regression analysis for measuring the downsize reputational risk; statistical methods for research on the human genome dynamics; inference in non-euclidean settings and for shape data; Bayesian methods for reliability and the analysis of complex data; methodological issues in using administrative data for clinical and epidemiological research; regression models with differential regularization; geostatistical methods for mobility analysis through mobile phone data exploration. This volume is the result of a careful selection among the contributions presented at the conference "S.Co.2013: Complex data modeling and computationally intensive methods for estimation and prediction" held...
TRAN-STAT: statistics for environmental studies
International Nuclear Information System (INIS)
Gilbert, R.O.
1984-09-01
This issue of TRAN-STAT discusses statistical methods for assessing the uncertainty in predictions of pollutant transport models, particularly for radionuclides. Emphasis is placed on radionuclide transport models but the statistical assessment techniques also apply in general to other types of pollutants. The report begins with an outline of why an assessment of prediction uncertainties is important. This is followed by an introduction to several methods currently used in these assessments. This in turn is followed by more detailed discussion of the methods, including examples. 43 references, 2 figures
A real-time prediction model for post-irradiation malignant cervical lymph nodes.
Lo, W-C; Cheng, P-W; Shueng, P-W; Hsieh, C-H; Chang, Y-L; Liao, L-J
2018-04-01
To establish a real-time predictive scoring model based on sonographic characteristics for identifying malignant cervical lymph nodes (LNs) in cancer patients after neck irradiation. One-hundred forty-four irradiation-treated patients underwent ultrasonography and ultrasound-guided fine-needle aspirations (USgFNAs), and the resultant data were used to construct a real-time and computerised predictive scoring model. This scoring system was further compared with our previously proposed prediction model. A predictive scoring model, 1.35 × (L axis) + 2.03 × (S axis) + 2.27 × (margin) + 1.48 × (echogenic hilum) + 3.7, was generated by stepwise multivariate logistic regression analysis. Neck LNs were considered to be malignant when the score was ≥ 7, corresponding to a sensitivity of 85.5%, specificity of 79.4%, positive predictive value (PPV) of 82.3%, negative predictive value (NPV) of 83.1%, and overall accuracy of 82.6%. When this new model and the original model were compared, the areas under the receiver operating characteristic curve (c-statistic) were 0.89 and 0.81, respectively (P real-time sonographic predictive scoring model was constructed to provide prompt and reliable guidance for USgFNA biopsies to manage cervical LNs after neck irradiation. © 2017 John Wiley & Sons Ltd.
Statistical Analysis of a Method to Predict Drug-Polymer Miscibility
DEFF Research Database (Denmark)
Knopp, Matthias Manne; Olesen, Niels Erik; Huang, Yanbin
2016-01-01
In this study, a method proposed to predict drug-polymer miscibility from differential scanning calorimetry measurements was subjected to statistical analysis. The method is relatively fast and inexpensive and has gained popularity as a result of the increasing interest in the formulation of drug...... as provided in this study. © 2015 Wiley Periodicals, Inc. and the American Pharmacists Association J Pharm Sci....
Spherical Process Models for Global Spatial Statistics
Jeong, Jaehong
2017-11-28
Statistical models used in geophysical, environmental, and climate science applications must reflect the curvature of the spatial domain in global data. Over the past few decades, statisticians have developed covariance models that capture the spatial and temporal behavior of these global data sets. Though the geodesic distance is the most natural metric for measuring distance on the surface of a sphere, mathematical limitations have compelled statisticians to use the chordal distance to compute the covariance matrix in many applications instead, which may cause physically unrealistic distortions. Therefore, covariance functions directly defined on a sphere using the geodesic distance are needed. We discuss the issues that arise when dealing with spherical data sets on a global scale and provide references to recent literature. We review the current approaches to building process models on spheres, including the differential operator, the stochastic partial differential equation, the kernel convolution, and the deformation approaches. We illustrate realizations obtained from Gaussian processes with different covariance structures and the use of isotropic and nonstationary covariance models through deformations and geographical indicators for global surface temperature data. To assess the suitability of each method, we compare their log-likelihood values and prediction scores, and we end with a discussion of related research problems.
Statistical Models for Social Networks
Snijders, Tom A. B.; Cook, KS; Massey, DS
2011-01-01
Statistical models for social networks as dependent variables must represent the typical network dependencies between tie variables such as reciprocity, homophily, transitivity, etc. This review first treats models for single (cross-sectionally observed) networks and then for network dynamics. For
Walter, Donald A.; Starn, J. Jeffrey
2013-01-01
Statistical models of nitrate occurrence in the glacial aquifer system of the northern United States, developed by the U.S. Geological Survey, use observed relations between nitrate concentrations and sets of explanatory variables—representing well-construction, environmental, and source characteristics— to predict the probability that nitrate, as nitrogen, will exceed a threshold concentration. However, the models do not explicitly account for the processes that control the transport of nitrogen from surface sources to a pumped well and use area-weighted mean spatial variables computed from within a circular buffer around the well as a simplified source-area conceptualization. The use of models that explicitly represent physical-transport processes can inform and, potentially, improve these statistical models. Specifically, groundwater-flow models simulate advective transport—predominant in many surficial aquifers— and can contribute to the refinement of the statistical models by (1) providing for improved, physically based representations of a source area to a well, and (2) allowing for more detailed estimates of environmental variables. A source area to a well, known as a contributing recharge area, represents the area at the water table that contributes recharge to a pumped well; a well pumped at a volumetric rate equal to the amount of recharge through a circular buffer will result in a contributing recharge area that is the same size as the buffer but has a shape that is a function of the hydrologic setting. These volume-equivalent contributing recharge areas will approximate circular buffers in areas of relatively flat hydraulic gradients, such as near groundwater divides, but in areas with steep hydraulic gradients will be elongated in the upgradient direction and agree less with the corresponding circular buffers. The degree to which process-model-estimated contributing recharge areas, which simulate advective transport and therefore account for
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
International Nuclear Information System (INIS)
Lü, Xiaoshu; Lu, Tao; Kibert, Charles J.; Viljanen, Martti
2015-01-01
Highlights: • This paper presents a new modeling method to forecast energy demands. • The model is based on physical–statistical approach to improving forecast accuracy. • A new method is proposed to address the heterogeneity challenge. • Comparison with measurements shows accurate forecasts of the model. • The first physical–statistical/heterogeneous building energy modeling approach is proposed and validated. - Abstract: Energy consumption forecasting is a critical and necessary input to planning and controlling energy usage in the building sector which accounts for 40% of the world’s energy use and the world’s greatest fraction of greenhouse gas emissions. However, due to the diversity and complexity of buildings as well as the random nature of weather conditions, energy consumption and loads are stochastic and difficult to predict. This paper presents a new methodology for energy demand forecasting that addresses the heterogeneity challenges in energy modeling of buildings. The new method is based on a physical–statistical approach designed to account for building heterogeneity to improve forecast accuracy. The physical model provides a theoretical input to characterize the underlying physical mechanism of energy flows. Then stochastic parameters are introduced into the physical model and the statistical time series model is formulated to reflect model uncertainties and individual heterogeneity in buildings. A new method of model generalization based on a convex hull technique is further derived to parameterize the individual-level model parameters for consistent model coefficients while maintaining satisfactory modeling accuracy for heterogeneous buildings. The proposed method and its validation are presented in detail for four different sports buildings with field measurements. The results show that the proposed methodology and model can provide a considerable improvement in forecasting accuracy
Hierarchical spatial models for predicting pygmy rabbit distribution and relative abundance
Wilson, T.L.; Odei, J.B.; Hooten, M.B.; Edwards, T.C.
2010-01-01
Conservationists routinely use species distribution models to plan conservation, restoration and development actions, while ecologists use them to infer process from pattern. These models tend to work well for common or easily observable species, but are of limited utility for rare and cryptic species. This may be because honest accounting of known observation bias and spatial autocorrelation are rarely included, thereby limiting statistical inference of resulting distribution maps. We specified and implemented a spatially explicit Bayesian hierarchical model for a cryptic mammal species (pygmy rabbit Brachylagus idahoensis). Our approach used two levels of indirect sign that are naturally hierarchical (burrows and faecal pellets) to build a model that allows for inference on regression coefficients as well as spatially explicit model parameters. We also produced maps of rabbit distribution (occupied burrows) and relative abundance (number of burrows expected to be occupied by pygmy rabbits). The model demonstrated statistically rigorous spatial prediction by including spatial autocorrelation and measurement uncertainty. We demonstrated flexibility of our modelling framework by depicting probabilistic distribution predictions using different assumptions of pygmy rabbit habitat requirements. Spatial representations of the variance of posterior predictive distributions were obtained to evaluate heterogeneity in model fit across the spatial domain. Leave-one-out cross-validation was conducted to evaluate the overall model fit. Synthesis and applications. Our method draws on the strengths of previous work, thereby bridging and extending two active areas of ecological research: species distribution models and multi-state occupancy modelling. Our framework can be extended to encompass both larger extents and other species for which direct estimation of abundance is difficult. ?? 2010 The Authors. Journal compilation ?? 2010 British Ecological Society.
Energy Technology Data Exchange (ETDEWEB)
Mishra, Srikanta; Schuetter, Jared
2014-11-01
We compare two approaches for building a statistical proxy model (metamodel) for CO₂ geologic sequestration from the results of full-physics compositional simulations. The first approach involves a classical Box-Behnken or Augmented Pairs experimental design with a quadratic polynomial response surface. The second approach used a space-filling maxmin Latin Hypercube sampling or maximum entropy design with the choice of five different meta-modeling techniques: quadratic polynomial, kriging with constant and quadratic trend terms, multivariate adaptive regression spline (MARS) and additivity and variance stabilization (AVAS). Simulations results for CO₂ injection into a reservoir-caprock system with 9 design variables (and 97 samples) were used to generate the data for developing the proxy models. The fitted models were validated with using an independent data set and a cross-validation approach for three different performance metrics: total storage efficiency, CO₂ plume radius and average reservoir pressure. The Box-Behnken–quadratic polynomial metamodel performed the best, followed closely by the maximin LHS–kriging metamodel.
Swarm Intelligence-Based Hybrid Models for Short-Term Power Load Prediction
Directory of Open Access Journals (Sweden)
Jianzhou Wang
2014-01-01
Full Text Available Swarm intelligence (SI is widely and successfully applied in the engineering field to solve practical optimization problems because various hybrid models, which are based on the SI algorithm and statistical models, are developed to further improve the predictive abilities. In this paper, hybrid intelligent forecasting models based on the cuckoo search (CS as well as the singular spectrum analysis (SSA, time series, and machine learning methods are proposed to conduct short-term power load prediction. The forecasting performance of the proposed models is augmented by a rolling multistep strategy over the prediction horizon. The test results are representative of the out-performance of the SSA and CS in tuning the seasonal autoregressive integrated moving average (SARIMA and support vector regression (SVR in improving load forecasting, which indicates that both the SSA-based data denoising and SI-based intelligent optimization strategy can effectively improve the model’s predictive performance. Additionally, the proposed CS-SSA-SARIMA and CS-SSA-SVR models provide very impressive forecasting results, demonstrating their strong robustness and universal forecasting capacities in terms of short-term power load prediction 24 hours in advance.
Functional summary statistics for the Johnson-Mehl model
DEFF Research Database (Denmark)
Møller, Jesper; Ghorbani, Mohammad
The Johnson-Mehl germination-growth model is a spatio-temporal point process model which among other things have been used for the description of neurotransmitters datasets. However, for such datasets parametric Johnson-Mehl models fitted by maximum likelihood have yet not been evaluated by means...... of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Sharma, Sanjib; Siddique, Ridwan; Reed, Seann; Ahnert, Peter; Mendoza, Pablo; Mejia, Alfonso
2018-03-01
The relative roles of statistical weather preprocessing and streamflow postprocessing in hydrological ensemble forecasting at short- to medium-range forecast lead times (day 1-7) are investigated. For this purpose, a regional hydrologic ensemble prediction system (RHEPS) is developed and implemented. The RHEPS is comprised of the following components: (i) hydrometeorological observations (multisensor precipitation estimates, gridded surface temperature, and gauged streamflow); (ii) weather ensemble forecasts (precipitation and near-surface temperature) from the National Centers for Environmental Prediction 11-member Global Ensemble Forecast System Reforecast version 2 (GEFSRv2); (iii) NOAA's Hydrology Laboratory-Research Distributed Hydrologic Model (HL-RDHM); (iv) heteroscedastic censored logistic regression (HCLR) as the statistical preprocessor; (v) two statistical postprocessors, an autoregressive model with a single exogenous variable (ARX(1,1)) and quantile regression (QR); and (vi) a comprehensive verification strategy. To implement the RHEPS, 1 to 7 days weather forecasts from the GEFSRv2 are used to force HL-RDHM and generate raw ensemble streamflow forecasts. Forecasting experiments are conducted in four nested basins in the US Middle Atlantic region, ranging in size from 381 to 12 362 km2. Results show that the HCLR preprocessed ensemble precipitation forecasts have greater skill than the raw forecasts. These improvements are more noticeable in the warm season at the longer lead times (> 3 days). Both postprocessors, ARX(1,1) and QR, show gains in skill relative to the raw ensemble streamflow forecasts, particularly in the cool season, but QR outperforms ARX(1,1). The scenarios that implement preprocessing and postprocessing separately tend to perform similarly, although the postprocessing-alone scenario is often more effective. The scenario involving both preprocessing and postprocessing consistently outperforms the other scenarios. In some cases
Assessing the performance of prediction models: a framework for traditional and novel measures
DEFF Research Database (Denmark)
Steyerberg, Ewout W; Vickers, Andrew J; Cook, Nancy R
2010-01-01
The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver...
Assessing the performance of prediction models: A framework for traditional and novel measures
E.W. Steyerberg (Ewout); A.J. Vickers (Andrew); N.R. Cook (Nancy); T.A. Gerds (Thomas); M. Gonen (Mithat); N. Obuchowski (Nancy); M. Pencina (Michael); M.W. Kattan (Michael)
2010-01-01
textabstractThe performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the
An updated prediction model of the global risk of cardiovascular disease in HIV-positive persons
DEFF Research Database (Denmark)
Friis-Møller, Nina; Ryom, Lene; Smith, Colette
2016-01-01
,663 HIV-positive persons from 20 countries in Europe and Australia, who were free of CVD at entry into the Data-collection on Adverse Effects of Anti-HIV Drugs (D:A:D) study. Cox regression models (full and reduced) were developed that predict the risk of a global CVD endpoint. The predictive performance...... significantly predicted risk more accurately than the recalibrated Framingham model (Harrell's c-statistic of 0.791, 0.783 and 0.766 for the D:A:D full, D:A:D reduced, and Framingham models respectively; p models also more accurately predicted five-year CVD-risk for key prognostic subgroups...... to quantify risk and to guide preventive care....
Glass viscosity calculation based on a global statistical modelling approach
Energy Technology Data Exchange (ETDEWEB)
Fluegel, Alex
2007-02-01
A global statistical glass viscosity model was developed for predicting the complete viscosity curve, based on more than 2200 composition-property data of silicate glasses from the scientific literature, including soda-lime-silica container and float glasses, TV panel glasses, borosilicate fiber wool and E type glasses, low expansion borosilicate glasses, glasses for nuclear waste vitrification, lead crystal glasses, binary alkali silicates, and various further compositions from over half a century. It is shown that within a measurement series from a specific laboratory the reported viscosity values are often over-estimated at higher temperatures due to alkali and boron oxide evaporation during the measurement and glass preparation, including data by Lakatos et al. (1972) and the recently published High temperature glass melt property database for process modeling by Seward et al. (2005). Similarly, in the glass transition range many experimental data of borosilicate glasses are reported too high due to phase separation effects. The developed global model corrects those errors. The model standard error was 9-17°C, with R^2 = 0.985-0.989. The prediction 95% confidence interval for glass in mass production largely depends on the glass composition of interest, the composition uncertainty, and the viscosity level. New insights in the mixed-alkali effect are provided.
Hein, H.; Mai, S.; Mayer, B.; Pohlmann, T.; Barjenbruch, U.
2012-04-01
The interactions of tides, external surges, storm surges and waves with an additional role of the coastal bathymetry define the probability of extreme water levels at the coast. Probabilistic analysis and also process based numerical models allow the estimation of future states. From the physical point of view both, deterministic processes and stochastic residuals are the fundamentals of high water statistics. This study uses a so called model chain to reproduce historic statistics of tidal high water levels (Thw) as well as the prediction of future statistics high water levels. The results of the numerical models are post-processed by a stochastic analysis. Recent studies show, that for future extrapolation of extreme Thw nonstationary parametric approaches are required. With the presented methods a better prediction of time depended parameter sets seems possible. The investigation region of this study is the southern German Bright. The model-chain is the representation of a downscaling process, which starts with an emissions scenario. Regional atmospheric and ocean models refine the results of global climate models. The concept of downscaling was chosen to resolve coastal topography sufficiently. The North Sea and estuaries are modeled with the three-dimensional model HAMburg Shelf Ocean Model. The running time includes 150 years (1950 - 2100). Results of four different hindcast runs and also of one future prediction run are validated. Based on multi-scale analysis and the theory of entropy we analyze whether any significant periodicities are represented numerically. Results show that also hindcasting the climate of Thw with a model chain for the last 60 years is a challenging task. For example, an additional modeling activity must be the inclusion of tides into regional climate ocean models. It is found that the statistics of climate variables derived from model results differs from the statistics derived from measurements. E.g. there are considerable shifts in
Monroy, Claire D; Gerson, Sarah A; Hunnius, Sabine
2018-05-01
Humans are sensitive to the statistical regularities in action sequences carried out by others. In the present eyetracking study, we investigated whether this sensitivity can support the prediction of upcoming actions when observing unfamiliar action sequences. In two between-subjects conditions, we examined whether observers would be more sensitive to statistical regularities in sequences performed by a human agent versus self-propelled 'ghost' events. Secondly, we investigated whether regularities are learned better when they are associated with contingent effects. Both implicit and explicit measures of learning were compared between agent and ghost conditions. Implicit learning was measured via predictive eye movements to upcoming actions or events, and explicit learning was measured via both uninstructed reproduction of the action sequences and verbal reports of the regularities. The findings revealed that participants, regardless of condition, readily learned the regularities and made correct predictive eye movements to upcoming events during online observation. However, different patterns of explicit-learning outcomes emerged following observation: Participants were most likely to re-create the sequence regularities and to verbally report them when they had observed an actor create a contingent effect. These results suggest that the shift from implicit predictions to explicit knowledge of what has been learned is facilitated when observers perceive another agent's actions and when these actions cause effects. These findings are discussed with respect to the potential role of the motor system in modulating how statistical regularities are learned and used to modify behavior.
Distributions with given marginals and statistical modelling
Fortiana, Josep; Rodriguez-Lallena, José
2002-01-01
This book contains a selection of the papers presented at the meeting `Distributions with given marginals and statistical modelling', held in Barcelona (Spain), July 17-20, 2000. In 24 chapters, this book covers topics such as the theory of copulas and quasi-copulas, the theory and compatibility of distributions, models for survival distributions and other well-known distributions, time series, categorical models, definition and estimation of measures of dependence, monotonicity and stochastic ordering, shape and separability of distributions, hidden truncation models, diagonal families, orthogonal expansions, tests of independence, and goodness of fit assessment. These topics share the use and properties of distributions with given marginals, this being the fourth specialised text on this theme. The innovative aspect of the book is the inclusion of statistical aspects such as modelling, Bayesian statistics, estimation, and tests.
Ground Motion Prediction Models for Caucasus Region
Jorjiashvili, Nato; Godoladze, Tea; Tvaradze, Nino; Tumanova, Nino
2016-04-01
Ground motion prediction models (GMPMs) relate ground motion intensity measures to variables describing earthquake source, path, and site effects. Estimation of expected ground motion is a fundamental earthquake hazard assessment. The most commonly used parameter for attenuation relation is peak ground acceleration or spectral acceleration because this parameter gives useful information for Seismic Hazard Assessment. Since 2003 development of Georgian Digital Seismic Network has started. In this study new GMP models are obtained based on new data from Georgian seismic network and also from neighboring countries. Estimation of models is obtained by classical, statistical way, regression analysis. In this study site ground conditions are additionally considered because the same earthquake recorded at the same distance may cause different damage according to ground conditions. Empirical ground-motion prediction models (GMPMs) require adjustment to make them appropriate for site-specific scenarios. However, the process of making such adjustments remains a challenge. This work presents a holistic framework for the development of a peak ground acceleration (PGA) or spectral acceleration (SA) GMPE that is easily adjustable to different seismological conditions and does not suffer from the practical problems associated with adjustments in the response spectral domain.
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Comparison of tree types of models for the prediction of final academic achievement
Directory of Open Access Journals (Sweden)
Silvana Gasar
2002-12-01
Full Text Available For efficient prevention of inappropriate secondary school choices and by that academic failure, school counselors need a tool for the prediction of individual pupil's final academic achievements. Using data mining techniques on pupils' data base and expert modeling, we developed several models for the prediction of final academic achievement in an individual high school educational program. For data mining, we used statistical analyses, clustering and two machine learning methods: developing classification decision trees and hierarchical decision models. Using an expert system shell DEX, an expert system, based on a hierarchical multi-attribute decision model, was developed manually. All the models were validated and evaluated from the viewpoint of their applicability. The predictive accuracy of DEX models and decision trees was equal and very satisfying, as it reached the predictive accuracy of an experienced counselor. With respect on the efficiency and difficulties in developing models, and relatively rapid changing of our education system, we propose that decision trees are used in further development of predictive models.
Pathak, Lakshmi; Singh, Vineeta; Niwas, Ram; Osama, Khwaja; Khan, Saif; Haque, Shafiul; Tripathi, C K M; Mishra, B N
2015-01-01
Cholesterol oxidase (COD) is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM), artificial neural network (ANN) and genetic algorithm (GA) have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD) and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL) obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500.
Directory of Open Access Journals (Sweden)
Lakshmi Pathak
Full Text Available Cholesterol oxidase (COD is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM, artificial neural network (ANN and genetic algorithm (GA have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500.
Statistical Maps of Ground Magnetic Disturbance Derived from Global Geospace Models
Rigler, E. J.; Wiltberger, M. J.; Love, J. J.
2017-12-01
Electric currents in space are the principal driver of magnetic variations measured at Earth's surface. These in turn induce geoelectric fields that present a natural hazard for technological systems like high-voltage power distribution networks. Modern global geospace models can reasonably simulate large-scale geomagnetic response to solar wind variations, but they are less successful at deterministic predictions of intense localized geomagnetic activity that most impacts technological systems on the ground. Still, recent studies have shown that these models can accurately reproduce the spatial statistical distributions of geomagnetic activity, suggesting that their physics are largely correct. Since the magnetosphere is a largely externally driven system, most model-measurement discrepancies probably arise from uncertain boundary conditions. So, with realistic distributions of solar wind parameters to establish its boundary conditions, we use the Lyon-Fedder-Mobarry (LFM) geospace model to build a synthetic multivariate statistical model of gridded ground magnetic disturbance. From this, we analyze the spatial modes of geomagnetic response, regress on available measurements to fill in unsampled locations on the grid, and estimate the global probability distribution of extreme magnetic disturbance. The latter offers a prototype geomagnetic "hazard map", similar to those used to characterize better-known geophysical hazards like earthquakes and floods.
Actuarial statistics with generalized linear mixed models
Antonio, K.; Beirlant, J.
2007-01-01
Over the last decade the use of generalized linear models (GLMs) in actuarial statistics has received a lot of attention, starting from the actuarial illustrations in the standard text by McCullagh and Nelder [McCullagh, P., Nelder, J.A., 1989. Generalized linear models. In: Monographs on Statistics
Glynn, Robert J; Colditz, Graham A; Tamimi, Rulla M; Chen, Wendy Y; Hankinson, Susan E; Willett, Walter W; Rosner, Bernard
2017-08-01
A breast cancer risk prediction rule previously developed by Rosner and Colditz has reasonable predictive ability. We developed a re-fitted version of this model, based on more than twice as many cases now including women up to age 85, and further extended it to a model that distinguished risk factor prediction of tumors with different estrogen/progesterone receptor status. We compared the calibration and discriminatory ability of the original, the re-fitted, and the type-specific models. Evaluation used data from the Nurses' Health Study during the period 1980-2008, when 4384 incident invasive breast cancers occurred over 1.5 million person-years. Model development used two-thirds of study subjects and validation used one-third. Predicted risks in the validation sample from the original and re-fitted models were highly correlated (ρ = 0.93), but several parameters, notably those related to use of menopausal hormone therapy and age, had different estimates. The re-fitted model was well-calibrated and had an overall C-statistic of 0.65. The extended, type-specific model identified several risk factors with varying associations with occurrence of tumors of different receptor status. However, this extended model relative to the prediction of any breast cancer did not meaningfully reclassify women who developed breast cancer to higher risk categories, nor women remaining cancer free to lower risk categories. The re-fitted Rosner-Colditz model has applicability to risk prediction in women up to age 85, and its discrimination is not improved by consideration of varying associations across tumor subtypes.
Nikolopoulos, E. I.; Destro, E.; Bhuiyan, M. A. E.; Borga, M., Sr.; Anagnostou, E. N.
2017-12-01
Fire disasters affect modern societies at global scale inducing significant economic losses and human casualties. In addition to their direct impacts they have various adverse effects on hydrologic and geomorphologic processes of a region due to the tremendous alteration of the landscape characteristics (vegetation, soil properties etc). As a consequence, wildfires often initiate a cascade of hazards such as flash floods and debris flows that usually follow the occurrence of a wildfire thus magnifying the overall impact in a region. Post-fire debris flows (PFDF) is one such type of hazards frequently occurring in Western United States where wildfires are a common natural disaster. Prediction of PDFD is therefore of high importance in this region and over the last years a number of efforts from United States Geological Survey (USGS) and National Weather Service (NWS) have been focused on the development of early warning systems that will help mitigate PFDF risk. This work proposes a prediction framework that is based on a nonparametric statistical technique (random forests) that allows predicting the occurrence of PFDF at regional scale with a higher degree of accuracy than the commonly used approaches that are based on power-law thresholds and logistic regression procedures. The work presented is based on a recently released database from USGS that reports a total of 1500 storms that triggered and did not trigger PFDF in a number of fire affected catchments in Western United States. The database includes information on storm characteristics (duration, accumulation, max intensity etc) and other auxiliary information of land surface properties (soil erodibility index, local slope etc). Results show that the proposed model is able to achieve a satisfactory prediction accuracy (threat score > 0.6) superior of previously published prediction frameworks highlighting the potential of nonparametric statistical techniques for development of PFDF prediction systems.
Directory of Open Access Journals (Sweden)
Hamid R. Khosravani
2016-01-01
Full Text Available Energy consumption has been increasing steadily due to globalization and industrialization. Studies have shown that buildings are responsible for the biggest proportion of energy consumption; for example in European Union countries, energy consumption in buildings represents around 40% of the total energy consumption. In order to control energy consumption in buildings, different policies have been proposed, from utilizing bioclimatic architectures to the use of predictive models within control approaches. There are mainly three groups of predictive models including engineering, statistical and artificial intelligence models. Nowadays, artificial intelligence models such as neural networks and support vector machines have also been proposed because of their high potential capabilities of performing accurate nonlinear mappings between inputs and outputs in real environments which are not free of noise. The main objective of this paper is to compare a neural network model which was designed utilizing statistical and analytical methods, with a group of neural network models designed benefiting from a multi objective genetic algorithm. Moreover, the neural network models were compared to a naïve autoregressive baseline model. The models are intended to predict electric power demand at the Solar Energy Research Center (Centro de Investigación en Energía SOLar or CIESOL in Spanish bioclimatic building located at the University of Almeria, Spain. Experimental results show that the models obtained from the multi objective genetic algorithm (MOGA perform comparably to the model obtained through a statistical and analytical approach, but they use only 0.8% of data samples and have lower model complexity.
International Nuclear Information System (INIS)
Pumir, Alain; Naso, Aurore
2010-01-01
A proper description of the velocity gradient tensor is crucial for understanding the dynamics of turbulent flows, in particular the energy transfer from large to small scales. Insight into the statistical properties of the velocity gradient tensor and into its coarse-grained generalization can be obtained with the help of a stochastic 'tetrad model' that describes the coarse-grained velocity gradient tensor based on the evolution of four points. Although the solution of the stochastic model can be formally expressed in terms of path integrals, its numerical determination in terms of the Monte-Carlo method is very challenging, as very few configurations contribute effectively to the statistical weight. Here, we discuss a strategy that allows us to solve the tetrad model numerically. The algorithm is based on the importance sampling method, which consists here of identifying and sampling preferentially the configurations that are likely to correspond to a large statistical weight, and selectively rejecting configurations with a small statistical weight. The algorithm leads to an efficient numerical determination of the solutions of the model and allows us to determine their qualitative behavior as a function of scale. We find that the moments of order n≤4 of the solutions of the model scale with the coarse-graining scale and that the scaling exponents are very close to the predictions of the Kolmogorov theory. The model qualitatively reproduces quite well the statistics concerning the local structure of the flow. However, we find that the model generally tends to predict an excess of strain compared to vorticity. Thus, our results show that while some physical aspects are not fully captured by the model, our approach leads to a very good description of several important qualitative properties of real turbulent flows.
Schonberger, Robert B; Dai, Feng; Brandt, Cynthia A; Burg, Matthew M
2015-09-01
Because of uncertainty regarding the reliability of perioperative blood pressures and traditional notions downplaying the role of anesthesiologists in longitudinal patient care, there is no consensus for anesthesiologists to recommend postoperative primary care blood pressure follow-up for patients presenting for surgery with an increased blood pressure. The decision of whom to refer should ideally be based on a predictive model that balances performance with ease-of-use. If an acceptable decision rule was developed, a new practice paradigm integrating the surgical encounter into broader public health efforts could be tested, with the goal of reducing long-term morbidity from hypertension among surgical patients. Using national data from US veterans receiving surgical care, we determined the prevalence of poorly controlled outpatient clinic blood pressures ≥140/90 mm Hg, based on the mean of up to 4 readings in the year after surgery. Four increasingly complex logistic regression models were assessed to predict this outcome. The first included the mean of 2 preoperative blood pressure readings; other models progressively added a broad array of demographic and clinical data. After internal validation, the C-statistics and the Net Reclassification Index between the simplest and most complex models were assessed. The performance characteristics of several simple blood pressure referral thresholds were then calculated. Among 215,621 patients, poorly controlled outpatient clinic blood pressure was present postoperatively in 25.7% (95% confidence interval [CI], 25.5%-25.9%) including 14.2% (95% CI, 13.9%-14.6%) of patients lacking a hypertension history. The most complex prediction model demonstrated statistically significant, but clinically marginal, improvement in discrimination over a model based on preoperative blood pressure alone (C-statistic, 0.736 [95% CI, 0.734-0.739] vs 0.721 [95% CI, 0.718-0.723]; P for difference 1 of 4 patients (95% CI, 25
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
Meads, Catherine; Ahmed, Ikhlaaq; Riley, Richard D
2012-04-01
A risk prediction model is a statistical tool for estimating the probability that a currently healthy individual with specific risk factors will develop a condition in the future such as breast cancer. Reliably accurate prediction models can inform future disease burdens, health policies and individual decisions. Breast cancer prediction models containing modifiable risk factors, such as alcohol consumption, BMI or weight, condom use, exogenous hormone use and physical activity, are of particular interest to women who might be considering how to reduce their risk of breast cancer and clinicians developing health policies to reduce population incidence rates. We performed a systematic review to identify and evaluate the performance of prediction models for breast cancer that contain modifiable factors. A protocol was developed and a sensitive search in databases including MEDLINE and EMBASE was conducted in June 2010. Extensive use was made of reference lists. Included were any articles proposing or validating a breast cancer prediction model in a general female population, with no language restrictions. Duplicate data extraction and quality assessment were conducted. Results were summarised qualitatively, and where possible meta-analysis of model performance statistics was undertaken. The systematic review found 17 breast cancer models, each containing a different but often overlapping set of modifiable and other risk factors, combined with an estimated baseline risk that was also often different. Quality of reporting was generally poor, with characteristics of included participants and fitted model results often missing. Only four models received independent validation in external data, most notably the 'Gail 2' model with 12 validations. None of the models demonstrated consistently outstanding ability to accurately discriminate between those who did and those who did not develop breast cancer. For example, random-effects meta-analyses of the performance of the
Dynamic Modeling and Very Short-term Prediction of Wind Power Output Using Box-Cox Transformation
Urata, Kengo; Inoue, Masaki; Murayama, Dai; Adachi, Shuichi
2016-09-01
We propose a statistical modeling method of wind power output for very short-term prediction. The modeling method with a nonlinear model has cascade structure composed of two parts. One is a linear dynamic part that is driven by a Gaussian white noise and described by an autoregressive model. The other is a nonlinear static part that is driven by the output of the linear part. This nonlinear part is designed for output distribution matching: we shape the distribution of the model output to match with that of the wind power output. The constructed model is utilized for one-step ahead prediction of the wind power output. Furthermore, we study the relation between the prediction accuracy and the prediction horizon.
2017-09-01
application of statistical inference. Even when human forecasters leverage their professional experience, which is often gained through long periods of... application throughout statistics and Bayesian data analysis. The multivariate form of 2( , ) (e.g., Figure 12) is similarly analytically...data (i.e., no systematic manipulations with analytical functions), it is common in the statistical literature to apply mathematical transformations
Bias in iterative reconstruction of low-statistics PET data: benefits of a resolution model
Energy Technology Data Exchange (ETDEWEB)
Walker, M D; Asselin, M-C; Julyan, P J; Feldmann, M; Matthews, J C [School of Cancer and Enabling Sciences, Wolfson Molecular Imaging Centre, MAHSC, University of Manchester, Manchester M20 3LJ (United Kingdom); Talbot, P S [Mental Health and Neurodegeneration Research Group, Wolfson Molecular Imaging Centre, MAHSC, University of Manchester, Manchester M20 3LJ (United Kingdom); Jones, T, E-mail: matthew.walker@manchester.ac.uk [Academic Department of Radiation Oncology, Christie Hospital, University of Manchester, Manchester M20 4BX (United Kingdom)
2011-02-21
Iterative image reconstruction methods such as ordered-subset expectation maximization (OSEM) are widely used in PET. Reconstructions via OSEM are however reported to be biased for low-count data. We investigated this and considered the impact for dynamic PET. Patient listmode data were acquired in [{sup 11}C]DASB and [{sup 15}O]H{sub 2}O scans on the HRRT brain PET scanner. These data were subsampled to create many independent, low-count replicates. The data were reconstructed and the images from low-count data were compared to the high-count originals (from the same reconstruction method). This comparison enabled low-statistics bias to be calculated for the given reconstruction, as a function of the noise-equivalent counts (NEC). Two iterative reconstruction methods were tested, one with and one without an image-based resolution model (RM). Significant bias was observed when reconstructing data of low statistical quality, for both subsampled human and simulated data. For human data, this bias was substantially reduced by including a RM. For [{sup 11}C]DASB the low-statistics bias in the caudate head at 1.7 M NEC (approx. 30 s) was -5.5% and -13% with and without RM, respectively. We predicted biases in the binding potential of -4% and -10%. For quantification of cerebral blood flow for the whole-brain grey- or white-matter, using [{sup 15}O]H{sub 2}O and the PET autoradiographic method, a low-statistics bias of <2.5% and <4% was predicted for reconstruction with and without the RM. The use of a resolution model reduces low-statistics bias and can hence be beneficial for quantitative dynamic PET.
Underwater Sound Propagation Modeling Methods for Predicting Marine Animal Exposure.
Hamm, Craig A; McCammon, Diana F; Taillefer, Martin L
2016-01-01
The offshore exploration and production (E&P) industry requires comprehensive and accurate ocean acoustic models for determining the exposure of marine life to the high levels of sound used in seismic surveys and other E&P activities. This paper reviews the types of acoustic models most useful for predicting the propagation of undersea noise sources and describes current exposure models. The severe problems caused by model sensitivity to the uncertainty in the environment are highlighted to support the conclusion that it is vital that risk assessments include transmission loss estimates with statistical measures of confidence.
Growth Curve and Structural Equation Modeling : Topics from the Indian Statistical Institute
2015-01-01
This book describes some recent trends in GCM research on different subject areas, both theoretical and applied. This includes tools and possibilities for further work through new techniques and modification of existing ones. A growth curve is an empirical model of the evolution of a quantity over time. Growth curves in longitudinal studies are used in disciplines including biology, statistics, population studies, economics, biological sciences, sociology, nano-biotechnology, and fluid mechanics. The volume includes original studies, theoretical findings and case studies from a wide range of applied work. This volume builds on presentations from a GCM workshop held at the Indian Statistical Institute, Giridih, January 18-19, 2014. This book follows the volume Advances in Growth Curve Models, published by Springer in 2013. The results have meaningful application in health care, prediction of crop yield, child nutrition, poverty measurements, estimation of growth rate, and other research areas.
Statistical models for thermal ageing of steel materials in nuclear power plants
International Nuclear Information System (INIS)
Persoz, M.
1996-01-01
Some category of steel materials in nuclear power plants may be subjected to thermal ageing, whose extent depends on the steel chemical composition and the ageing parameters, i.e. temperature and duration. This ageing affects the 'impact strength' of the materials, which is a mechanical property. In order to assess the residual lifetime of these components, a probabilistic study has been launched, which takes into account the scatter over the input parameters of the mechanical model. Predictive formulae for estimating the impact strength of aged materials are important input data of the model. A data base has been created with impact strength results obtained from an ageing program in laboratory and statistical treatments have been undertaken. Two kinds of model have been developed, with non linear regression methods (PROC NLIN, available in SAS/STAT). The first one, using a hyperbolic tangent function, is partly based on physical considerations, and the second one, of an exponential type, is purely statistically built. The difficulties consist in selecting the significant parameters and attributing initial values to the coefficients, which is a requirement of the NLIN procedure. This global statistical analysis has led to general models that are unction of the chemical variables and the ageing parameters. These models are as precise (if not more) as local models that had been developed earlier for some specific values of ageing temperature and ageing duration. This paper describes the data and the methodology used to build the models and analyses the results given by the SAS system. (author)
Statistical modelling in biostatistics and bioinformatics selected papers
Peng, Defen
2014-01-01
This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...
Predictive models reduce talent development costs in female gymnastics.
Pion, Johan; Hohmann, Andreas; Liu, Tianbiao; Lenoir, Matthieu; Segers, Veerle
2017-04-01
This retrospective study focuses on the comparison of different predictive models based on the results of a talent identification test battery for female gymnasts. We studied to what extent these models have the potential to optimise selection procedures, and at the same time reduce talent development costs in female artistic gymnastics. The dropout rate of 243 female elite gymnasts was investigated, 5 years past talent selection, using linear (discriminant analysis) and non-linear predictive models (Kohonen feature maps and multilayer perceptron). The coaches classified 51.9% of the participants correct. Discriminant analysis improved the correct classification to 71.6% while the non-linear technique of Kohonen feature maps reached 73.7% correctness. Application of the multilayer perceptron even classified 79.8% of the gymnasts correctly. The combination of different predictive models for talent selection can avoid deselection of high-potential female gymnasts. The selection procedure based upon the different statistical analyses results in decrease of 33.3% of cost because the pool of selected athletes can be reduced to 92 instead of 138 gymnasts (as selected by the coaches). Reduction of the costs allows the limited resources to be fully invested in the high-potential athletes.
International Nuclear Information System (INIS)
Asim, F.; Mahmood, M.
2017-01-01
Statistical modeling imparts significant role in predicting the impact of potential factors affecting the one step fixation process of reactive printing and easy care finishing. Investigation of significant factors on tear strength of cotton fabric for single step fixation of reactive printing and easy care finishing has been carried out in this research work using experimental design technique. The potential design factors were; concentration of reactive dye, concentration of crease resistant, fixation method and fixation temperature. The experiments were designed using DoE (Design of Experiment) and analyzed through software Design Expert. The detailed analysis of significant factors and interactions including ANOVA (Analysis of Variance), residuals, model accuracy and statistical model for tear strength has been presented. The interaction and contour plots of vital factors has been examined. It has been found from the statistical analysis that each factor has an interaction with other factor. Most of the investigated factors showed curvature effect on other factor. After critical examination of significant plots, quadratic model of tear strength with significant terms and their interaction at alpha = 0.05 has been developed. The calculated correlation coefficient, R2 of the developed model is 0.9056. The high values of correlation coefficient inferred that developed equation of tear strength will precisely predict the tear strength over the range of values. (author)
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
International Nuclear Information System (INIS)
Higdon, Dave; McDonnell, Jordan D; Schunck, Nicolas; Sarich, Jason; Wild, Stefan M
2015-01-01
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based model η(θ), where θ denotes the uncertain, best input setting. Hence the statistical model is of the form y=η(θ)+ϵ, where ϵ accounts for measurement, and possibly other, error sources. When nonlinearity is present in η(⋅), the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model η(⋅). This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. We also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory. (paper)
Yang, Jing; Zammit, Christian; Dudley, Bruce
2017-04-01
The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.
Perai, A H; Nassiri Moghaddam, H; Asadpour, S; Bahrampour, J; Mansoori, Gh
2010-07-01
There has been a considerable and continuous interest to develop equations for rapid and accurate prediction of the ME of meat and bone meal. In this study, an artificial neural network (ANN), a partial least squares (PLS), and a multiple linear regression (MLR) statistical method were used to predict the TME(n) of meat and bone meal based on its CP, ether extract, and ash content. The accuracy of the models was calculated by R(2) value, MS error, mean absolute percentage error, mean absolute deviation, bias, and Theil's U. The predictive ability of an ANN was compared with a PLS and a MLR model using the same training data sets. The squared regression coefficients of prediction for the MLR, PLS, and ANN models were 0.38, 0.36, and 0.94, respectively. The results revealed that ANN produced more accurate predictions of TME(n) as compared with PLS and MLR methods. Based on the results of this study, ANN could be used as a promising approach for rapid prediction of nutritive value of meat and bone meal.
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (pmachine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
FIRE BEHAVIOR PREDICTING MODELS EFFICIENCY IN BRAZILIAN COMMERCIAL EUCALYPT PLANTATIONS
Directory of Open Access Journals (Sweden)
Benjamin Leonardo Alves White
2016-12-01
Full Text Available Knowing how a wildfire will behave is extremely important in order to assist in fire suppression and prevention operations. Since the 1940’s mathematical models to estimate how the fire will behave have been developed worldwide, however, none of them, until now, had their efficiency tested in Brazilian commercial eucalypt plantations nor in other vegetation types in the country. This study aims to verify the accuracy of the Rothermel (1972 fire spread model, the Byram (1959 flame length model, and the fire spread and length equations derived from the McArthur (1962 control burn meters. To meet these objectives, 105 experimental laboratory fires were done and their results compared with the predicted values from the models tested. The Rothermel and Byram models predicted better than McArthur’s, nevertheless, all of them underestimated the fire behavior aspects evaluated and were statistically different from the experimental data.
Directory of Open Access Journals (Sweden)
Mónica F. Díaz
2012-12-01
Full Text Available Volatile organic compounds (VOCs are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log Pliver for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR models for the prediction of log Pliver, where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log Pliver models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.
A consensus approach for estimating the predictive accuracy of dynamic models in biology.
Villaverde, Alejandro F; Bongard, Sophia; Mauch, Klaus; Müller, Dirk; Balsa-Canto, Eva; Schmid, Joachim; Banga, Julio R
2015-04-01
Mathematical models that predict the complex dynamic behaviour of cellular networks are fundamental in systems biology, and provide an important basis for biomedical and biotechnological applications. However, obtaining reliable predictions from large-scale dynamic models is commonly a challenging task due to lack of identifiability. The present work addresses this challenge by presenting a methodology for obtaining high-confidence predictions from dynamic models using time-series data. First, to preserve the complex behaviour of the network while reducing the number of estimated parameters, model parameters are combined in sets of meta-parameters, which are obtained from correlations between biochemical reaction rates and between concentrations of the chemical species. Next, an ensemble of models with different parameterizations is constructed and calibrated. Finally, the ensemble is used for assessing the reliability of model predictions by defining a measure of convergence of model outputs (consensus) that is used as an indicator of confidence. We report results of computational tests carried out on a metabolic model of Chinese Hamster Ovary (CHO) cells, which are used for recombinant protein production. Using noisy simulated data, we find that the aggregated ensemble predictions are on average more accurate than the predictions of individual ensemble models. Furthermore, ensemble predictions with high consensus are statistically more accurate than ensemble predictions with large variance. The procedure provides quantitative estimates of the confidence in model predictions and enables the analysis of sufficiently complex networks as required for practical applications. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes
Williams Colin P.
1999-01-01
Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.
Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu
2015-06-01
Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
International Nuclear Information System (INIS)
Koch, J.; Peterson, S-R.
1995-10-01
Models used to simulate environmental transfer of radionuclides typically include many parameters, the values of which are uncertain. An estimation of the uncertainty associated with the predictions is therefore essential. Difference methods to quantify the uncertainty in the prediction parameter uncertainties are reviewed. A statistical approach using random sampling techniques is recommended for complex models with many uncertain parameters. In this approach, the probability density function of the model output is obtained from multiple realizations of the model according to a multivariate random sample of the different input parameters. Sampling efficiency can be improved by using a stratified scheme (Latin Hypercube Sampling). Sample size can also be restricted when statistical tolerance limits needs to be estimated. Methods to rank parameters according to their contribution to uncertainty in the model prediction are also reviewed. Recommended are measures of sensitivity, correlation and regression coefficients that can be calculated on values of input and output variables generated during the propagation of uncertainties through the model. A parameter uncertainty analysis is performed for the CHERPAC food chain model which estimates subjective confidence limits and intervals on the predictions at a 95% confidence level. A sensitivity analysis is also carried out using partial rank correlation coefficients. This identified and ranks the parameters which are the main contributors to uncertainty in the predictions, thereby guiding further research efforts. (author). 44 refs., 2 tabs., 4 figs
Energy Technology Data Exchange (ETDEWEB)
Koch, J. [Israel Atomic Energy Commission, Yavne (Israel). Soreq Nuclear Research Center; Peterson, S-R.
1995-10-01
Models used to simulate environmental transfer of radionuclides typically include many parameters, the values of which are uncertain. An estimation of the uncertainty associated with the predictions is therefore essential. Difference methods to quantify the uncertainty in the prediction parameter uncertainties are reviewed. A statistical approach using random sampling techniques is recommended for complex models with many uncertain parameters. In this approach, the probability density function of the model output is obtained from multiple realizations of the model according to a multivariate random sample of the different input parameters. Sampling efficiency can be improved by using a stratified scheme (Latin Hypercube Sampling). Sample size can also be restricted when statistical tolerance limits needs to be estimated. Methods to rank parameters according to their contribution to uncertainty in the model prediction are also reviewed. Recommended are measures of sensitivity, correlation and regression coefficients that can be calculated on values of input and output variables generated during the propagation of uncertainties through the model. A parameter uncertainty analysis is performed for the CHERPAC food chain model which estimates subjective confidence limits and intervals on the predictions at a 95% confidence level. A sensitivity analysis is also carried out using partial rank correlation coefficients. This identified and ranks the parameters which are the main contributors to uncertainty in the predictions, thereby guiding further research efforts. (author). 44 refs., 2 tabs., 4 figs.
Developing models for the prediction of hospital healthcare waste generation rate.
Tesfahun, Esubalew; Kumie, Abera; Beyene, Abebe
2016-01-01
An increase in the number of health institutions, along with frequent use of disposable medical products, has contributed to the increase of healthcare waste generation rate. For proper handling of healthcare waste, it is crucial to predict the amount of waste generation beforehand. Predictive models can help to optimise healthcare waste management systems, set guidelines and evaluate the prevailing strategies for healthcare waste handling and disposal. However, there is no mathematical model developed for Ethiopian hospitals to predict healthcare waste generation rate. Therefore, the objective of this research was to develop models for the prediction of a healthcare waste generation rate. A longitudinal study design was used to generate long-term data on solid healthcare waste composition, generation rate and develop predictive models. The results revealed that the healthcare waste generation rate has a strong linear correlation with the number of inpatients (R(2) = 0.965), and a weak one with the number of outpatients (R(2) = 0.424). Statistical analysis was carried out to develop models for the prediction of the quantity of waste generated at each hospital (public, teaching and private). In these models, the number of inpatients and outpatients were revealed to be significant factors on the quantity of waste generated. The influence of the number of inpatients and outpatients treated varies at different hospitals. Therefore, different models were developed based on the types of hospitals. © The Author(s) 2015.
International Nuclear Information System (INIS)
Zahedi, Gholamreza; Karami, Zohre; Yaghoobi, Hamed
2009-01-01
In this study, various estimation methods have been reviewed for hydrate formation temperature (HFT) and two procedures have been presented. In the first method, two general correlations have been proposed for HFT. One of the correlations has 11 parameters, and the second one has 18 parameters. In order to obtain constants in proposed equations, 203 experimental data points have been collected from literatures. The Engineering Equation Solver (EES) and Statistical Package for the Social Sciences (SPSS) soft wares have been employed for statistical analysis of the data. Accuracy of the obtained correlations also has been declared by comparison with experimental data and some recent common used correlations. In the second method, HFT is estimated by artificial neural network (ANN) approach. In this case, various architectures have been checked using 70% of experimental data for training of ANN. Among the various architectures multi layer perceptron (MLP) network with trainlm training algorithm was found as the best architecture. Comparing the obtained ANN model results with 30% of unseen data confirms ANN excellent estimation performance. It was found that ANN is more accurate than traditional methods and even our two proposed correlations for HFT estimation.
Lunt, Mark
2015-07-01
In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The prediction of intelligence in preschool children using alternative models to regression.
Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E
2011-12-01
Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
DEFF Research Database (Denmark)
Smith, Todd; Muller, David C; Moons, Karel G M
2018-01-01
in the European Prospective Investigation into Cancer and Nutrition (EPIC) and the UK Biobank. The performance of the models to predict the occurrence of colorectal cancer within 5 or 10 years after study enrolment was assessed by discrimination (C-statistic) and calibration (plots of observed vs predicted......-based colorectal screening programmes. Future work should both evaluate this potential, through modelling and impact studies, and ascertain if further enhancement in their performance can be obtained....
Probably not future prediction using probability and statistical inference
Dworsky, Lawrence N
2008-01-01
An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...
Directory of Open Access Journals (Sweden)
A. E. Sikorska
2012-04-01
Full Text Available Urbanization and the resulting land-use change strongly affect the water cycle and runoff-processes in watersheds. Unfortunately, small urban watersheds, which are most affected by urban sprawl, are mostly ungauged. This makes it intrinsically difficult to assess the consequences of urbanization. Most of all, it is unclear how to reliably assess the predictive uncertainty given the structural deficits of the applied models. In this study, we therefore investigate the uncertainty of flood predictions in ungauged urban basins from structurally uncertain rainfall-runoff models. To this end, we suggest a procedure to explicitly account for input uncertainty and model structure deficits using Bayesian statistics with a continuous-time autoregressive error model. In addition, we propose a concise procedure to derive prior parameter distributions from base data and successfully apply the methodology to an urban catchment in Warsaw, Poland. Based on our results, we are able to demonstrate that the autoregressive error model greatly helps to meet the statistical assumptions and to compute reliable prediction intervals. In our study, we found that predicted peak flows were up to 7 times higher than observations. This was reduced to 5 times with Bayesian updating, using only few discharge measurements. In addition, our analysis suggests that imprecise rainfall information and model structure deficits contribute mostly to the total prediction uncertainty. In the future, flood predictions in ungauged basins will become more important due to ongoing urbanization as well as anthropogenic and climatic changes. Thus, providing reliable measures of uncertainty is crucial to support decision making.
Statistical ecology comes of age
Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-01-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151
Statistical ecology comes of age.
Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-12-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.
Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics
Lazarus, S. M.; Holman, B. P.; Splitt, M. E.
2017-12-01
A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.
Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.
1999-01-01
Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
Directory of Open Access Journals (Sweden)
Amany E. Aly
2016-04-01
Full Text Available When a system consisting of independent components of the same type, some appropriate actions may be done as soon as a portion of them have failed. It is, therefore, important to be able to predict later failure times from earlier ones. One of the well-known failure distributions commonly used to model component life, is the modified Weibull distribution (MWD. In this paper, two pivotal quantities are proposed to construct prediction intervals for future unobservable lifetimes based on generalized order statistics (gos from MWD. Moreover, a pivotal quantity is developed to reconstruct missing observations at the beginning of experiment. Furthermore, Monte Carlo simulation studies are conducted and numerical computations are carried out to investigate the efficiency of presented results. Finally, two illustrative examples for real data sets are analyzed.
Prediction of Chemical Function: Model Development and ...
The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi
Directory of Open Access Journals (Sweden)
Li He
2014-01-01
Full Text Available For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes.
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Can multivariate models based on MOAKS predict OA knee pain? Data from the Osteoarthritis Initiative
Luna-Gómez, Carlos D.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Galván-Tejada, Carlos E.; Celaya-Padilla, José M.
2017-03-01
Osteoarthritis is the most common rheumatic disease in the world. Knee pain is the most disabling symptom in the disease, the prediction of pain is one of the targets in preventive medicine, this can be applied to new therapies or treatments. Using the magnetic resonance imaging and the grading scales, a multivariate model based on genetic algorithms is presented. Using a predictive model can be useful to associate minor structure changes in the joint with the future knee pain. Results suggest that multivariate models can be predictive with future knee chronic pain. All models; T0, T1 and T2, were statistically significant, all p values were 0.60.
Manning, Robert M.
1991-01-01
The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
Statistical magnetohydrodynamics and reversed-field-pinch quiescence
International Nuclear Information System (INIS)
Turner, L.
1982-01-01
A statistical model of a bounded, incompressible, cylindrical magnetofluid is presented. This model predicts the presence of magnetic fluctuations about a cylindrically-symmetric, Bessel-function-model, mean magnetic field, which satisfies del x = μ . As theta → 1.56, the model predicts that the significant region of the fluctuation spectrum narrows down to a single (coherent) m = 1 mode. An analogy between the Debye length of an electrostatic plasma and μ -1 suggests the physical validity o the model's prediction of when /r - r'/ greater than or equal to μ -1
Statistical models and methods for reliability and survival analysis
Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo
2013-01-01
Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Direct Breakthrough Curve Prediction From Statistics of Heterogeneous Conductivity Fields
Hansen, Scott K.; Haslauer, Claus P.; Cirpka, Olaf A.; Vesselinov, Velimir V.
2018-01-01
This paper presents a methodology to predict the shape of solute breakthrough curves in heterogeneous aquifers at early times and/or under high degrees of heterogeneity, both cases in which the classical macrodispersion theory may not be applicable. The methodology relies on the observation that breakthrough curves in heterogeneous media are generally well described by lognormal distributions, and mean breakthrough times can be predicted analytically. The log-variance of solute arrival is thus sufficient to completely specify the breakthrough curves, and this is calibrated as a function of aquifer heterogeneity and dimensionless distance from a source plane by means of Monte Carlo analysis and statistical regression. Using the ensemble of simulated groundwater flow and solute transport realizations employed to calibrate the predictive regression, reliability estimates for the prediction are also developed. Additional theoretical contributions include heuristics for the time until an effective macrodispersion coefficient becomes applicable, and also an expression for its magnitude that applies in highly heterogeneous systems. It is seen that the results here represent a way to derive continuous time random walk transition distributions from physical considerations rather than from empirical field calibration.
Aegisdottir, Stefania; White, Michael J.; Spengler, Paul M.; Maugherman, Alan S.; Anderson, Linda A.; Cook, Robert S.; Nichols, Cassandra N.; Lampropoulos, Georgios K.; Walker, Blain S.; Cohen, Genna; Rush, Jeffrey D.
2006-01-01
Clinical predictions made by mental health practitioners are compared with those using statistical approaches. Sixty-seven studies were identified from a comprehensive search of 56 years of research; 92 effect sizes were derived from these studies. The overall effect of clinical versus statistical prediction showed a somewhat greater accuracy for…
Qualitative and quantitative guidelines for the comparison of environmental model predictions
International Nuclear Information System (INIS)
Scott, M.
1995-03-01
The question of how to assess or compare predictions from a number of models is one of concern in the validation of models, in understanding the effects of different models and model parameterizations on model output, and ultimately in assessing model reliability. Comparison of model predictions with observed data is the basic tool of model validation while comparison of predictions amongst different models provides one measure of model credibility. The guidance provided here is intended to provide qualitative and quantitative approaches (including graphical and statistical techniques) to such comparisons for use within the BIOMOVS II project. It is hoped that others may find it useful. It contains little technical information on the actual methods but several references are provided for the interested reader. The guidelines are illustrated on data from the VAMP CB scenario. Unfortunately, these data do not permit all of the possible approaches to be demonstrated since predicted uncertainties were not provided. The questions considered are concerned with a) intercomparison of model predictions and b) comparison of model predictions with the observed data. A series of examples illustrating some of the different types of data structure and some possible analyses have been constructed. A bibliography of references on model validation is provided. It is important to note that the results of the various techniques discussed here, whether qualitative or quantitative, should not be considered in isolation. Overall model performance must also include an evaluation of model structure and formulation, i.e. conceptual model uncertainties, and results for performance measures must be interpreted in this context. Consider a number of models which are used to provide predictions of a number of quantities at a number of time points. In the case of the VAMP CB scenario, the results include predictions of total deposition of Cs-137 and time dependent concentrations in various
Yates, Keegan M; Untaroiu, Costin D
2018-04-16
Statistical shape analysis was conducted on 15 pairs (left and right) of human kidneys. It was shown that the left and right kidney were significantly different in size and shape. In addition, several common modes of kidney variation were identified using statistical shape analysis. Semi-automatic mesh morphing techniques have been developed to efficiently create subject specific meshes from a template mesh with a similar geometry. Subject specific meshes as well as probabilistic kidney meshes were created from a template mesh. Mesh quality remained about the same as the template mesh while only taking a fraction of the time to create the mesh from scratch or morph with manually identified landmarks. This technique can help enhance the quality of information gathered from experimental testing with subject specific meshes as well as help to more efficiently predict injury by creating models with the mean shape as well as models at the extremes for each principal component. Copyright © 2018 Elsevier Ltd. All rights reserved.
Haines, Ryan W; Lin, Shih-Pin; Hewson, Russell; Kirwan, Christopher J; Torrance, Hew D; O'Dwyer, Michael J; West, Anita; Brohi, Karim; Pearse, Rupert M; Zolfaghari, Parjam; Prowle, John R
2018-02-26
Acute Kidney Injury (AKI) complicating major trauma is associated with increased mortality and morbidity. Traumatic AKI has specific risk factors and predictable time-course facilitating diagnostic modelling. In a single centre, retrospective observational study we developed risk prediction models for AKI after trauma based on data around intensive care admission. Models predicting AKI were developed using data from 830 patients, using data reduction followed by logistic regression, and were independently validated in a further 564 patients. AKI occurred in 163/830 (19.6%) with 42 (5.1%) receiving renal replacement therapy (RRT). First serum creatinine and phosphate, units of blood transfused in first 24 h, age and Charlson score discriminated need for RRT and AKI early after trauma. For RRT c-statistics were good to excellent: development: 0.92 (0.88-0.96), validation: 0.91 (0.86-0.97). Modelling AKI stage 2-3, c-statistics were also good, development: 0.81 (0.75-0.88) and validation: 0.83 (0.74-0.92). The model predicting AKI stage 1-3 performed moderately, development: c-statistic 0.77 (0.72-0.81), validation: 0.70 (0.64-0.77). Despite good discrimination of need for RRT, positive predictive values (PPV) at the optimal cut-off were only 23.0% (13.7-42.7) in development. However, PPV for the alternative endpoint of RRT and/or death improved to 41.2% (34.8-48.1) highlighting death as a clinically relevant endpoint to RRT.
Thematic and spatial resolutions affect model-based predictions of tree species distribution.
Liang, Yu; He, Hong S; Fraser, Jacob S; Wu, ZhiWei
2013-01-01
Subjective decisions of thematic and spatial resolutions in characterizing environmental heterogeneity may affect the characterizations of spatial pattern and the simulation of occurrence and rate of ecological processes, and in turn, model-based tree species distribution. Thus, this study quantified the importance of thematic and spatial resolutions, and their interaction in predictions of tree species distribution (quantified by species abundance). We investigated how model-predicted species abundances changed and whether tree species with different ecological traits (e.g., seed dispersal distance, competitive capacity) had different responses to varying thematic and spatial resolutions. We used the LANDIS forest landscape model to predict tree species distribution at the landscape scale and designed a series of scenarios with different thematic (different numbers of land types) and spatial resolutions combinations, and then statistically examined the differences of species abundance among these scenarios. Results showed that both thematic and spatial resolutions affected model-based predictions of species distribution, but thematic resolution had a greater effect. Species ecological traits affected the predictions. For species with moderate dispersal distance and relatively abundant seed sources, predicted abundance increased as thematic resolution increased. However, for species with long seeding distance or high shade tolerance, thematic resolution had an inverse effect on predicted abundance. When seed sources and dispersal distance were not limiting, the predicted species abundance increased with spatial resolution and vice versa. Results from this study may provide insights into the choice of thematic and spatial resolutions for model-based predictions of tree species distribution.
Model-generated air quality statistics for application in vegetation response models in Alberta
International Nuclear Information System (INIS)
McVehil, G.E.; Nosal, M.
1990-01-01
To test and apply vegetation response models in Alberta, air pollution statistics representative of various parts of the Province are required. At this time, air quality monitoring data of the requisite accuracy and time resolution are not available for most parts of Alberta. Therefore, there exists a need to develop appropriate air quality statistics. The objectives of the work reported here were to determine the applicability of model generated air quality statistics and to develop by modelling, realistic and representative time series of hourly SO 2 concentrations that could be used to generate the statistics demanded by vegetation response models
International Nuclear Information System (INIS)
Nam, Cheol; Choi, Byeong Kwon; Jeong, Yong Hwan; Jung, Youn Ho
2001-01-01
During the last decade, the failure behavior of high-burnup fuel rods under RIA has been an extensive concern since observations of fuel rod failures at low enthalpy. Of great importance is placed on failure prediction of fuel rod in the point of licensing criteria and safety in extending burnup achievement. To address the issue, a statistics-based methodology is introduced to predict failure probability of irradiated fuel rods. Based on RIA simulation results in literature, a failure enthalpy correlation for irradiated fuel rod is constructed as a function of oxide thickness, fuel burnup, and pulse width. From the failure enthalpy correlation, a single damage parameter, equivalent enthalpy, is defined to reflect the effects of the three primary factors as well as peak fuel enthalpy. Moreover, the failure distribution function with equivalent enthalpy is derived, applying a two-parameter Weibull statistical model. Using these equations, the sensitivity analysis is carried out to estimate the effects of burnup, corrosion, peak fuel enthalpy, pulse width and cladding materials used
Performance modeling, loss networks, and statistical multiplexing
Mazumdar, Ravi
2009-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
Statistical evaluation of diagnostic performance topics in ROC analysis
Zou, Kelly H; Bandos, Andriy I; Ohno-Machado, Lucila; Rockette, Howard E
2016-01-01
Statistical evaluation of diagnostic performance in general and Receiver Operating Characteristic (ROC) analysis in particular are important for assessing the performance of medical tests and statistical classifiers, as well as for evaluating predictive models or algorithms. This book presents innovative approaches in ROC analysis, which are relevant to a wide variety of applications, including medical imaging, cancer research, epidemiology, and bioinformatics. Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis covers areas including monotone-transformation techniques in parametric ROC analysis, ROC methods for combined and pooled biomarkers, Bayesian hierarchical transformation models, sequential designs and inferences in the ROC setting, predictive modeling, multireader ROC analysis, and free-response ROC (FROC) methodology. The book is suitable for graduate-level students and researchers in statistics, biostatistics, epidemiology, public health, biomedical engineering, radiology, medi...
MASKED AREAS IN SHEAR PEAK STATISTICS: A FORWARD MODELING APPROACH
International Nuclear Information System (INIS)
Bard, D.; Kratochvil, J. M.; Dawson, W.
2016-01-01
The statistics of shear peaks have been shown to provide valuable cosmological information beyond the power spectrum, and will be an important constraint of models of cosmology in forthcoming astronomical surveys. Surveys include masked areas due to bright stars, bad pixels etc., which must be accounted for in producing constraints on cosmology from shear maps. We advocate a forward-modeling approach, where the impacts of masking and other survey artifacts are accounted for in the theoretical prediction of cosmological parameters, rather than correcting survey data to remove them. We use masks based on the Deep Lens Survey, and explore the impact of up to 37% of the survey area being masked on LSST and DES-scale surveys. By reconstructing maps of aperture mass the masking effect is smoothed out, resulting in up to 14% smaller statistical uncertainties compared to simply reducing the survey area by the masked area. We show that, even in the presence of large survey masks, the bias in cosmological parameter estimation produced in the forward-modeling process is ≈1%, dominated by bias caused by limited simulation volume. We also explore how this potential bias scales with survey area and evaluate how much small survey areas are impacted by the differences in cosmological structure in the data and simulated volumes, due to cosmic variance
Kolokythas, Kostantinos; Vasileios, Salamalikis; Athanassios, Argiriou; Kazantzidis, Andreas
2015-04-01
The wind is a result of complex interactions of numerous mechanisms taking place in small or large scales, so, the better knowledge of its behavior is essential in a variety of applications, especially in the field of power production coming from wind turbines. In the literature there is a considerable number of models, either physical or statistical ones, dealing with the problem of simulation and prediction of wind speed. Among others, Artificial Neural Networks (ANNs) are widely used for the purpose of wind forecasting and, in the great majority of cases, outperform other conventional statistical models. In this study, a number of ANNs with different architectures, which have been created and applied in a dataset of wind time series, are compared to Auto Regressive Integrated Moving Average (ARIMA) statistical models. The data consist of mean hourly wind speeds coming from a wind farm on a hilly Greek region and cover a period of one year (2013). The main goal is to evaluate the models ability to simulate successfully the wind speed at a significant point (target). Goodness-of-fit statistics are performed for the comparison of the different methods. In general, the ANN showed the best performance in the estimation of wind speed prevailing over the ARIMA models.
Model for predicting the injury severity score.
Hagiwara, Shuichi; Oshima, Kiyohiro; Murata, Masato; Kaneko, Minoru; Aoki, Makoto; Kanbe, Masahiko; Nakamura, Takuro; Ohyama, Yoshio; Tamura, Jun'ichi
2015-07-01
To determine the formula that predicts the injury severity score from parameters that are obtained in the emergency department at arrival. We reviewed the medical records of trauma patients who were transferred to the emergency department of Gunma University Hospital between January 2010 and December 2010. The injury severity score, age, mean blood pressure, heart rate, Glasgow coma scale, hemoglobin, hematocrit, red blood cell count, platelet count, fibrinogen, international normalized ratio of prothrombin time, activated partial thromboplastin time, and fibrin degradation products, were examined in those patients on arrival. To determine the formula that predicts the injury severity score, multiple linear regression analysis was carried out. The injury severity score was set as the dependent variable, and the other parameters were set as candidate objective variables. IBM spss Statistics 20 was used for the statistical analysis. Statistical significance was set at P Watson ratio was 2.200. A formula for predicting the injury severity score in trauma patients was developed with ordinary parameters such as fibrin degradation products and mean blood pressure. This formula is useful because we can predict the injury severity score easily in the emergency department.
Stochastic models for predicting environmental impact in aquatic ecosystems
International Nuclear Information System (INIS)
Stewart-Oaten, A.
1986-01-01
The purpose of stochastic predictions are discussed in relation to the environmental impacts of nuclear power plants on aquatic ecosystems. One purpose is to aid in making rational decisions about whether a power plant should be built, where, and how it should be designed. The other purpose is to check on the models themselves in the light of what eventually happens. The author discusses the role or statistical decision theory in the decision-making problem. Various types of stochastic models and their problems are presented. In addition some suggestions are made for generating usable stochastic models, and checking and improving on them. 12 references
Prediction of transmission loss through an aircraft sidewall using statistical energy analysis
Ming, Ruisen; Sun, Jincai
1989-06-01
The transmission loss of randomly incident sound through an aircraft sidewall is investigated using statistical energy analysis. Formulas are also obtained for the simple calculation of sound transmission loss through single- and double-leaf panels. Both resonant and nonresonant sound transmissions can be easily calculated using the formulas. The formulas are used to predict sound transmission losses through a Y-7 propeller airplane panel. The panel measures 2.56 m x 1.38 m and has two windows. The agreement between predicted and measured values through most of the frequency ranges tested is quite good.
A residual life prediction model based on the generalized σ -N curved surface
Directory of Open Access Journals (Sweden)
Zongwen AN
2016-06-01
Full Text Available In order to investigate change rule of the residual life of structure under random repeated load, firstly, starting from the statistic meaning of random repeated load, the joint probability density function of maximum stress and minimum stress is derived based on the characteristics of order statistic (maximum order statistic and minimum order statistic; then, based on the equation of generalized σ -N curved surface, considering the influence of load cycles number on fatigue life, a relationship among minimum stress, maximum stress and residual life, that is the σmin(n- σmax(n-Nr(n curved surface model, is established; finally, the validity of the proposed model is demonstrated by a practical case. The result shows that the proposed model can reflect the influence of maximum stress and minimum stress on residual life of structure under random repeated load, which can provide a theoretical basis for life prediction and reliability assessment of structure.
International Nuclear Information System (INIS)
Ayodele, T.R.; Ogunjuyigbe, A.S.O.
2015-01-01
In this paper, probability distribution of clearness index is proposed for the prediction of global solar radiation. First, the clearness index is obtained from the past data of global solar radiation, then, the parameters of the appropriate distribution that best fit the clearness index are determined. The global solar radiation is thereafter predicted from the clearness index using inverse transformation of the cumulative distribution function. To validate the proposed method, eight years global solar radiation data (2000–2007) of Ibadan, Nigeria are used to determine the parameters of appropriate probability distribution for clearness index. The calculated parameters are then used to predict the future monthly average global solar radiation for the following year (2008). The predicted values are compared with the measured values using four statistical tests: the Root Mean Square Error (RMSE), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error) and the coefficient of determination (R"2). The proposed method is also compared to the existing regression models. The results show that logistic distribution provides the best fit for clearness index of Ibadan and the proposed method is effective in predicting the monthly average global solar radiation with overall RMSE of 0.383 MJ/m"2/day, MAE of 0.295 MJ/m"2/day, MAPE of 2% and R"2 of 0.967. - Highlights: • Distribution of clearnes index is proposed for prediction of global solar radiation. • The clearness index is obtained from the past data of global solar radiation. • The parameters of distribution that best fit the clearness index are determined. • Solar radiation is predicted from the clearness index using inverse transformation. • The method is effective in predicting the monthly average global solar radiation.
Statistical Models of Adaptive Immune populations
Sethna, Zachary; Callan, Curtis; Walczak, Aleksandra; Mora, Thierry
The availability of large (104-106 sequences) datasets of B or T cell populations from a single individual allows reliable fitting of complex statistical models for naïve generation, somatic selection, and hypermutation. It is crucial to utilize a probabilistic/informational approach when modeling these populations. The inferred probability distributions allow for population characterization, calculation of probability distributions of various hidden variables (e.g. number of insertions), as well as statistical properties of the distribution itself (e.g. entropy). In particular, the differences between the T cell populations of embryonic and mature mice will be examined as a case study. Comparing these populations, as well as proposed mixed populations, provides a concrete exercise in model creation, comparison, choice, and validation.
Improving Gastric Cancer Outcome Prediction Using Single Time-Point Artificial Neural Network Models
Nilsaz-Dezfouli, Hamid; Abu-Bakar, Mohd Rizam; Arasan, Jayanthi; Adam, Mohd Bakri; Pourhoseingholi, Mohamad Amin
2017-01-01
In cancer studies, the prediction of cancer outcome based on a set of prognostic variables has been a long-standing topic of interest. Current statistical methods for survival analysis offer the possibility of modelling cancer survivability but require unrealistic assumptions about the survival time distribution or proportionality of hazard. Therefore, attention must be paid in developing nonlinear models with less restrictive assumptions. Artificial neural network (ANN) models are primarily useful in prediction when nonlinear approaches are required to sift through the plethora of available information. The applications of ANN models for prognostic and diagnostic classification in medicine have attracted a lot of interest. The applications of ANN models in modelling the survival of patients with gastric cancer have been discussed in some studies without completely considering the censored data. This study proposes an ANN model for predicting gastric cancer survivability, considering the censored data. Five separate single time-point ANN models were developed to predict the outcome of patients after 1, 2, 3, 4, and 5 years. The performance of ANN model in predicting the probabilities of death is consistently high for all time points according to the accuracy and the area under the receiver operating characteristic curve. PMID:28469384
Tropical geometry of statistical models.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.
Directory of Open Access Journals (Sweden)
Sun Yong
2014-01-01
Full Text Available The hydrolysis of corn stover using hydrochloric acid was studied. The kinetic parameters of the mathematical models for predicting the yields of xylose, glucose, furfural and acetic acid were obtained, and the corresponding xylose generation activation energy of 100 kJ/mol was determined. The characterization of corn stover using with different techniques during hydrolysis indicated an effective removal of xylan and the slightly alteration on the structures of cellulose and lignin. A 23five levels Central Composite Design (CCD was used to develop a statistical model for the optimization of process variables including acid concentration, pretreatment temperature and time. The optimum conditions determined by this model were found to be 108ºC for 80 minutes with acid concentration of 5.8%. Under these conditions, the maximised results are the following: xylose 19.93 g/L, glucose 1.2 g/L, furfural 1.5 g/L, acetic acid 1.3 g/L. The validation of the model indicates a good agreement between the experimental results and the predicted values.
Relativistic beaming and quasar statistics
International Nuclear Information System (INIS)
Orr, M.J.L.; Browne, I.W.A.
1982-01-01
The statistical predictions of a unified scheme for the radio emission from quasars are explored. This scheme attributes the observed differences between flat- and steep-spectrum quasars to projection and the effects of relativistic beaming of the emission from the nuclear components. We use a simple quasar model consisting of a compact relativistically beamed core with spectral index zero and unbeamed lobes, spectral index - 1, to predict the proportion of flat-spectrum sources in flux-limited samples selected at different frequencies. In our model this fraction depends on the core Lorentz factor, γ and we find that a value of approximately 5 gives satisfactory agreement with observation. In a similar way the model is used to construct the expected number/flux density counts for flat-spectrum quasars from the observed steep-spectrum counts. Again, good agreement with the observations is obtained if the average core Lorentz factor is about 5. Independent estimates of γ from observations of superluminal motion in quasars are of the same order of magnitude. We conclude that the statistical properties of quasars are entirely consistent with the predictions of simple relativistic-beam models. (author)
Shahabi, Himan; Hashim, Mazlan
2015-04-22
This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.
Calibration plots for risk prediction models in the presence of competing risks.
Gerds, Thomas A; Andersen, Per K; Kattan, Michael W
2014-08-15
A predicted risk of 17% can be called reliable if it can be expected that the event will occur to about 17 of 100 patients who all received a predicted risk of 17%. Statistical models can predict the absolute risk of an event such as cardiovascular death in the presence of competing risks such as death due to other causes. For personalized medicine and patient counseling, it is necessary to check that the model is calibrated in the sense that it provides reliable predictions for all subjects. There are three often encountered practical problems when the aim is to display or test if a risk prediction model is well calibrated. The first is lack of independent validation data, the second is right censoring, and the third is that when the risk scale is continuous, the estimation problem is as difficult as density estimation. To deal with these problems, we propose to estimate calibration curves for competing risks models based on jackknife pseudo-values that are combined with a nearest neighborhood smoother and a cross-validation approach to deal with all three problems. Copyright © 2014 John Wiley & Sons, Ltd.
Schliep, E. M.; Gelfand, A. E.; Holland, D. M.
2015-12-01
There is considerable demand for accurate air quality information in human health analyses. The sparsity of ground monitoring stations across the United States motivates the need for advanced statistical models to predict air quality metrics, such as PM2.5, at unobserved sites. Remote sensing technologies have the potential to expand our knowledge of PM2.5 spatial patterns beyond what we can predict from current PM2.5 monitoring networks. Data from satellites have an additional advantage in not requiring extensive emission inventories necessary for most atmospheric models that have been used in earlier data fusion models for air pollution. Statistical models combining monitoring station data with satellite-obtained aerosol optical thickness (AOT), also referred to as aerosol optical depth (AOD), have been proposed in the literature with varying levels of success in predicting PM2.5. The benefit of using AOT is that satellites provide complete gridded spatial coverage. However, the challenges involved with using it in fusion models are (1) the correlation between the two data sources varies both in time and in space, (2) the data sources are temporally and spatially misaligned, and (3) there is extensive missingness in the monitoring data and also in the satellite data due to cloud cover. We propose a hierarchical autoregressive spatially varying coefficients model to jointly model the two data sources, which addresses the foregoing challenges. Additionally, we offer formal model comparison for competing models in terms of model fit and out of sample prediction of PM2.5. The models are applied to daily observations of PM2.5 and AOT in the summer months of 2013 across the conterminous United States. Most notably, during this time period, we find small in-sample improvement incorporating AOT into our autoregressive model but little out-of-sample predictive improvement.
12th Workshop on Stochastic Models, Statistics and Their Applications
Rafajłowicz, Ewaryst; Szajowski, Krzysztof
2015-01-01
This volume presents the latest advances and trends in stochastic models and related statistical procedures. Selected peer-reviewed contributions focus on statistical inference, quality control, change-point analysis and detection, empirical processes, time series analysis, survival analysis and reliability, statistics for stochastic processes, big data in technology and the sciences, statistical genetics, experiment design, and stochastic models in engineering. Stochastic models and related statistical procedures play an important part in furthering our understanding of the challenging problems currently arising in areas of application such as the natural sciences, information technology, engineering, image analysis, genetics, energy and finance, to name but a few. This collection arises from the 12th Workshop on Stochastic Models, Statistics and Their Applications, Wroclaw, Poland.
Seasonal prediction of East Asian summer rainfall using a multi-model ensemble system
Ahn, Joong-Bae; Lee, Doo-Young; Yoo, Jin‑Ho
2015-04-01
Using the retrospective forecasts of seven state-of-the-art coupled models and their multi-model ensemble (MME) for boreal summers, the prediction skills of climate models in the western tropical Pacific (WTP) and East Asian region are assessed. The predicti