Statistical Seasonal Sea Surface based Prediction Model
Suarez, Roberto; Rodriguez-Fonseca, Belen; Diouf, Ibrahima
2014-05-01
The interannual variability of the sea surface temperature (SST) plays a key role in the strongly seasonal rainfall regime on the West African region. The predictability of the seasonal cycle of rainfall is a field widely discussed by the scientific community, with results that fail to be satisfactory due to the difficulty of dynamical models to reproduce the behavior of the Inter Tropical Convergence Zone (ITCZ). To tackle this problem, a statistical model based on oceanic predictors has been developed at the Universidad Complutense of Madrid (UCM) with the aim to complement and enhance the predictability of the West African Monsoon (WAM) as an alternative to the coupled models. The model, called S4CAST (SST-based Statistical Seasonal Forecast) is based on discriminant analysis techniques, specifically the Maximum Covariance Analysis (MCA) and Canonical Correlation Analysis (CCA). Beyond the application of the model to the prediciton of rainfall in West Africa, its use extends to a range of different oceanic, atmospheric and helth related parameters influenced by the temperature of the sea surface as a defining factor of variability.
Statistical assessment of predictive modeling uncertainty
Barzaghi, Riccardo; Marotta, Anna Maria
2017-04-01
When the results of geophysical models are compared with data, the uncertainties of the model are typically disregarded. We propose a method for defining the uncertainty of a geophysical model based on a numerical procedure that estimates the empirical auto and cross-covariances of model-estimated quantities. These empirical values are then fitted by proper covariance functions and used to compute the covariance matrix associated with the model predictions. The method is tested using a geophysical finite element model in the Mediterranean region. Using a novel χ2 analysis in which both data and model uncertainties are taken into account, the model's estimated tectonic strain pattern due to the Africa-Eurasia convergence in the area that extends from the Calabrian Arc to the Alpine domain is compared with that estimated from GPS velocities while taking into account the model uncertainty through its covariance structure and the covariance of the GPS estimates. The results indicate that including the estimated model covariance in the testing procedure leads to lower observed χ2 values that have better statistical significance and might help a sharper identification of the best-fitting geophysical models.
Model output statistics applied to wind power prediction
Energy Technology Data Exchange (ETDEWEB)
Joensen, A.; Giebel, G.; Landberg, L. [Risoe National Lab., Roskilde (Denmark); Madsen, H.; Nielsen, H.A. [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)
1999-03-01
Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.
Statistical procedures for evaluating daily and monthly hydrologic model predictions
Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.
2004-01-01
The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.
Statistical characteristics of irreversible predictability time in regional ocean models
Directory of Open Access Journals (Sweden)
P. C. Chu
2005-01-01
Full Text Available Probabilistic aspects of regional ocean model predictability is analyzed using the probability density function (PDF of the irreversible predictability time (IPT (called τ-PDF computed from an unconstrained ensemble of stochastic perturbations in initial conditions, winds, and open boundary conditions. Two-attractors (a chaotic attractor and a small-amplitude stable limit cycle are found in the wind-driven circulation. Relationship between attractor's residence time and IPT determines the τ-PDF for the short (up to several weeks and intermediate (up to two months predictions. The τ-PDF is usually non-Gaussian but not multi-modal for red-noise perturbations in initial conditions and perturbations in the wind and open boundary conditions. Bifurcation of τ-PDF occurs as the tolerance level varies. Generally, extremely successful predictions (corresponding to the τ-PDF's tail toward large IPT domain are not outliers and share the same statistics as a whole ensemble of predictions.
Estimating Predictive Variance for Statistical Gas Distribution Modelling
Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo
2009-05-01
Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.
A statistically predictive model for future monsoon failure in India
Schewe, Jacob; Levermann, Anders
2012-12-01
Indian monsoon rainfall is vital for a large share of the world’s population. Both reliably projecting India’s future precipitation and unraveling abrupt cessations of monsoon rainfall found in paleorecords require improved understanding of its stability properties. While details of monsoon circulations and the associated rainfall are complex, full-season failure is dominated by large-scale positive feedbacks within the region. Here we find that in a comprehensive climate model, monsoon failure is possible but very rare under pre-industrial conditions, while under future warming it becomes much more frequent. We identify the fundamental intraseasonal feedbacks that are responsible for monsoon failure in the climate model, relate these to observational data, and build a statistically predictive model for such failure. This model provides a simple dynamical explanation for future changes in the frequency distribution of seasonal mean all-Indian rainfall. Forced only by global mean temperature and the strength of the Pacific Walker circulation in spring, it reproduces the trend as well as the multidecadal variability in the mean and skewness of the distribution, as found in the climate model. The approach offers an alternative perspective on large-scale monsoon variability as the result of internal instabilities modulated by pre-seasonal ambient climate conditions.
Monthly to seasonal low flow prediction: statistical versus dynamical models
Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke
2016-04-01
the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.
Tollenaar, N.; Van der Heijden, P.G.M.|info:eu-repo/dai/nl/073087998
2013-01-01
Using criminal population criminal conviction history information, prediction models are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining
Tollenaar, N.; Van der Heijden, P.G.M.
2013-01-01
Using criminal population criminal conviction history information, prediction models are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining
Prediction of Frost Occurrences Using Statistical Modeling Approaches
Directory of Open Access Journals (Sweden)
Hyojin Lee
2016-01-01
Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.
Prediction and setup of phytoplankton statistical model of Qiandaohu Lake
Institute of Scientific and Technical Information of China (English)
严力蛟; 全为民; 赵晓慧
2004-01-01
This research considers the mathematical relationship between concentration of Chla and seven environmental factors, i.e. Lake water temperature (T), Secci-depth (SD), pH, DO, CODMn, Total Nitrogen (TN), Total Phosphorus (TP).Stepwise linear regression of 1997 to 1999 monitoring data at each sampling point of Qiandaohu Lake yielded the multivariate regression models presented in this paper. The concentration of Chla as simulation for the year 2000 by the regression model was similar to the observed value. The suggested mathematical relationship could be used to predict changes in the lakewater environment at any point in time. The results showed that SD, TP and pH were the most significant factors affecting Chla concentration.
Pernot, Pascal
2016-01-01
Inference of physical parameters from reference data is a well studied problem with many intricacies (inconsistent sets of data due to experimental systematic errors, approximate physical models...). The complexity is further increased when the inferred parameters are used to make predictions (virtual measurements) because parameters uncertainty has to be estimated in addition to parameters best value. The literature is rich in statistical models for the calibration/prediction problem, each having benefits and limitations. We review and evaluate standard and state-of-the-art statistical models in a common bayesian framework, and test them on synthetic and real datasets of temperature-dependent viscosity for the calibration of Lennard-Jones parameters of a Chapman-Enskog model.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
Does Statistical Significance Help to Evaluate Predictive Performance of Competing Models?
Directory of Open Access Journals (Sweden)
Levent Bulut
2016-04-01
Full Text Available In Monte Carlo experiment with simulated data, we show that as a point forecast criterion, the Clark and West's (2006 unconditional test of mean squared prediction errors does not reflect the relative performance of a superior model over a relatively weaker one. The simulation results show that even though the mean squared prediction errors of a constructed superior model is far below a weaker alternative, the Clark- West test does not reflect this in their test statistics. Therefore, studies that use this statistic in testing the predictive accuracy of alternative exchange rate models, stock return predictability, inflation forecasting, and unemployment forecasting should not weight too much on the magnitude of the statistically significant Clark-West tests statistics.
Joint multivariate statistical model and its applications to the synthetic earthquake prediction
Institute of Scientific and Technical Information of China (English)
韩天锡; 蒋淳; 魏雪丽; 韩梅; 冯德益
2004-01-01
Considering the problems that should be solved in the synthetic earthquake prediction at present, a new model is proposed in the paper. It is called joint multivariate statistical model combined by principal component analysis with discriminatory analysis. Principal component analysis and discriminatory analysis are very important theories in multivariate statistical analysis that has developed quickly in the late thirty years. By means of maximization information method, we choose several earthquake prediction factors whose cumulative proportions of total sample variances are beyond 90% from numerous earthquake prediction factors. The paper applies regression analysis and Mahalanobis discrimination to extrapolating synthetic prediction. Furthermore, we use this model to characterize and predict earthquakes in North China (30°～42°N, 108°～125°E) and better prediction results are obtained.
Tonkin, Matthew J.; Tiedeman, Claire R.; Ely, D. Matthew; Hill, Mary C.
2007-01-01
The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one
Directory of Open Access Journals (Sweden)
Han Jiang
2016-01-01
Full Text Available Recently, a number of short-term speed prediction approaches have been developed, in which most algorithms are based on machine learning and statistical theory. This paper examined the multistep ahead prediction performance of eight different models using the 2-minute travel speed data collected from three Remote Traffic Microwave Sensors located on a southbound segment of 4th ring road in Beijing City. Specifically, we consider five machine learning methods: Back Propagation Neural Network (BPNN, nonlinear autoregressive model with exogenous inputs neural network (NARXNN, support vector machine with radial basis function as kernel function (SVM-RBF, Support Vector Machine with Linear Function (SVM-LIN, and Multilinear Regression (MLR as candidate. Three statistical models are also selected: Autoregressive Integrated Moving Average (ARIMA, Vector Autoregression (VAR, and Space-Time (ST model. From the prediction results, we find the following meaningful results: (1 the prediction accuracy of speed deteriorates as the prediction time steps increase for all models; (2 the BPNN, NARXNN, and SVM-RBF can clearly outperform two traditional statistical models: ARIMA and VAR; (3 the prediction performance of ANN is superior to that of SVM and MLR; (4 as time step increases, the ST model can consistently provide the lowest MAE comparing with ARIMA and VAR.
Lin, Zheng-Zhe
2013-01-01
By molecular dynamics simulations and free energy calculations based on Monte Carlo method, the detailed balance between Pt cluster isomers was investigated. For clusters of n50. Then, a statistical mechanical model was built to evaluate unimolecular isomerization rate and simplify the prediction of isomer formation probability. This model is simpler than transition state theory and can be easily applied on ab initio calculations to predict the lifetime of nanostructures.
Wang, Ming; Long, Qi
2016-09-01
Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations.
2014-12-01
suggestions for reducing this burden to Washington Headquarters Service , Directorate for Information Operations and Reports 1215 Jefferson Davis...caveat in the population statistics shown in Figure 10 is that the data represent aggregations of both juveniles and adults, nighttime and daytime...to left. In this plot, black circles indicate the raw position data; the blue line represents a corrected path after fdtering’to remove frame rate
Establishment of Statistical Model for Precipitation Prediction in the Flood Season in China
Institute of Scientific and Technical Information of China (English)
无
2011-01-01
[Objective] The research aimed to establish the regression model which was used to predict the precipitation in the flood season in China.[Method] Based on statistical model,North Atlantic oscillation index and the sea surface temperature index in development and declining stages of ENSO were used to predict East Asian summer monsoon index.After the stations were divided into 16 zones,the same factors were used to establish the regression model predicting the station precipitation in the flood season in Chi...
Statistical model predictions for p+p and Pb+Pb collisions at LHC
Kraus, I.; Cleymans, J.; Oeschler, H.; Redlich, K.; Wheaton, S.
2009-01-01
Particle production in p+p and central collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in
A statistical model including age to predict passenger postures in the rear seats of automobiles.
Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J
2016-06-01
Few statistical models of rear seat passenger posture have been published, and none has taken into account the effects of occupant age. This study developed new statistical models for predicting passenger postures in the rear seats of automobiles. Postures of 89 adults with a wide range of age and body size were measured in a laboratory mock-up in seven seat configurations. Posture-prediction models for female and male passengers were separately developed by stepwise regression using age, body dimensions, seat configurations and two-way interactions as potential predictors. Passenger posture was significantly associated with age and the effects of other two-way interaction variables depended on age. A set of posture-prediction models are presented for women and men, and the prediction results are compared with previously published models. This study is the first study of passenger posture to include a large cohort of older passengers and the first to report a significant effect of age for adults. The presented models can be used to position computational and physical human models for vehicle design and assessment. Practitioner Summary: The significant effects of age, body dimensions and seat configuration on rear seat passenger posture were identified. The models can be used to accurately position computational human models or crash test dummies for older passengers in known rear seat configurations.
Directory of Open Access Journals (Sweden)
Minal Patel
2016-01-01
Full Text Available Service can be delivered anywhere and anytime in cloud computing using virtualization. The main issue to handle virtualized resources is to balance ongoing workloads. The migration of virtual machines has two major techniques: (i reducing dirty pages using CPU scheduling and (ii compressing memory pages. The available techniques for live migration are not able to predict dirty pages in advance. In the proposed framework, time series based prediction techniques are developed using historical analysis of past data. The time series is generated with transferring of memory pages iteratively. Here, two different regression based models of time series are proposed. The first model is developed using statistical probability based regression model and it is based on ARIMA (autoregressive integrated moving average model. The second one is developed using statistical learning based regression model and it uses SVR (support vector regression model. These models are tested on real data set of Xen to compute downtime, total number of pages transferred, and total migration time. The ARIMA model is able to predict dirty pages with 91.74% accuracy and the SVR model is able to predict dirty pages with 94.61% accuracy that is higher than ARIMA.
Kim, Yoojin; Kim, Ha-Rim; Choi, Yong-Sang; Kim, WonMoo; Kim, Hye-Sil
2016-11-01
Statistical seasonal prediction models for the Arctic sea ice concentration (SIC) were developed for the late summer (August-October) when the downward trend is dramatic. The absorbed solar radiation (ASR) at the top of the atmosphere in June has a significant seasonal leading role on the SIC. Based on the lagged ASR-SIC relationship, two simple statistical models were established: the Markovian stochastic and the linear regression models. Crossvalidated hindcasts of SIC from 1979 to 2014 by the two models were compared with each other and observation. The hindcasts showed general agreement between the models as they share a common predictor, ASR in June and the observed SIC was well reproduced, especially over the relatively thin-ice regions (of one- or multi-year sea ice). The robust predictability confirms the functional role of ASR in the prediction of SIC. In particular, the SIC prediction in October was quite promising probably due to the pronounced icealbedo feedback. The temporal correlation coefficients between the predicted SIC and the observed SIC were 0.79 and 0.82 by the Markovian and regression models, respectively. Small differences were observed between the two models; the regression model performed slightly better in August and September in terms of temporal correlation coefficients. Meanwhile, the prediction skills of the Markovian model in October were higher in the north of Chukchi, the East Siberian, and the Laptev Seas. A strong non-linear relationship between ASR in June and SIC in October in these areas would have increased the predictability of the Markovian model.
A Statistical Model for the Prediction of Wind-Speed Probabilities in the Atmospheric Surface Layer
Efthimiou, G. C.; Hertwig, D.; Andronopoulos, S.; Bartzis, J. G.; Coceal, O.
2016-11-01
Wind fields in the atmospheric surface layer (ASL) are highly three-dimensional and characterized by strong spatial and temporal variability. For various applications such as wind-comfort assessments and structural design, an understanding of potentially hazardous wind extremes is important. Statistical models are designed to facilitate conclusions about the occurrence probability of wind speeds based on the knowledge of low-order flow statistics. Being particularly interested in the upper tail regions we show that the statistical behaviour of near-surface wind speeds is adequately represented by the Beta distribution. By using the properties of the Beta probability density function in combination with a model for estimating extreme values based on readily available turbulence statistics, it is demonstrated that this novel modelling approach reliably predicts the upper margins of encountered wind speeds. The model's basic parameter is derived from three substantially different calibrating datasets of flow in the ASL originating from boundary-layer wind-tunnel measurements and direct numerical simulation. Evaluating the model based on independent field observations of near-surface wind speeds shows a high level of agreement between the statistically modelled horizontal wind speeds and measurements. The results show that, based on knowledge of only a few simple flow statistics (mean wind speed, wind-speed fluctuations and integral time scales), the occurrence probability of velocity magnitudes at arbitrary flow locations in the ASL can be estimated with a high degree of confidence.
Output from Statistical Predictive Models as Input to eLearning Dashboards
Directory of Open Access Journals (Sweden)
Marlene A. Smith
2015-06-01
Full Text Available We describe how statistical predictive models might play an expanded role in educational analytics by giving students automated, real-time information about what their current performance means for eventual success in eLearning environments. We discuss how an online messaging system might tailor information to individual students using predictive analytics. The proposed system would be data-driven and quantitative; e.g., a message might furnish the probability that a student will successfully complete the certificate requirements of a massive open online course. Repeated messages would prod underperforming students and alert instructors to those in need of intervention. Administrators responsible for accreditation or outcomes assessment would have ready documentation of learning outcomes and actions taken to address unsatisfactory student performance. The article’s brief introduction to statistical predictive models sets the stage for a description of the messaging system. Resources and methods needed to develop and implement the system are discussed.
Majda, Andrew J.; Qi, Di
2016-02-01
Turbulent dynamical systems with a large phase space and a high degree of instabilities are ubiquitous in climate science and engineering applications. Statistical uncertainty quantification (UQ) to the response to the change in forcing or uncertain initial data in such complex turbulent systems requires the use of imperfect models due to the lack of both physical understanding and the overwhelming computational demands of Monte Carlo simulation with a large-dimensional phase space. Thus, the systematic development of reduced low-order imperfect statistical models for UQ in turbulent dynamical systems is a grand challenge. This paper applies a recent mathematical strategy for calibrating imperfect models in a training phase and accurately predicting the response by combining information theory and linear statistical response theory in a systematic fashion. A systematic hierarchy of simple statistical imperfect closure schemes for UQ for these problems is designed and tested which are built through new local and global statistical energy conservation principles combined with statistical equilibrium fidelity. The forty mode Lorenz 96 (L-96) model which mimics forced baroclinic turbulence is utilized as a test bed for the calibration and predicting phases for the hierarchy of computationally cheap imperfect closure models both in the full phase space and in a reduced three-dimensional subspace containing the most energetic modes. In all of phase spaces, the nonlinear response of the true model is captured accurately for the mean and variance by the systematic closure model, while alternative methods based on the fluctuation-dissipation theorem alone are much less accurate. For reduced-order model for UQ in the three-dimensional subspace for L-96, the systematic low-order imperfect closure models coupled with the training strategy provide the highest predictive skill over other existing methods for general forced response yet have simple design principles based on a
Wang, Yuexing; Yao, Yao; Keer, Leon M.
2017-02-01
Electromigration is an irreversible mass diffusion process with damage accumulation in microelectronic materials and components under high current density. Based on experimental observations, cotton type voids dominate the electromigration damage accumulation prior to cracking in the solder interconnect. To clarify the damage evolution process corresponding to cotton type void growth, a statistical model is proposed to predict the stochastic characteristic of void growth under high current density. An analytical solution of the cotton type void volume growth over time is obtained. The synchronous electromigration induced damage accumulation is predicted by combining the statistical void growth and the entropy increment. The electromigration induced damage evolution in solder joints is developed and applied to verify the tensile strength deterioration of solder joints due to electromigration. The predictions agree well with the experimental results.
Performance of statistical models to predict mental health and substance abuse cost
Directory of Open Access Journals (Sweden)
Ettner Susan L
2006-10-01
Full Text Available Abstract Background Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice. Methods Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH or substance abuse (SA diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS; Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE, mean absolute prediction error (MAPE, and predictive ratios of predicted to observed cost (PR among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample. Results The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples. Conclusion Models with square-root transformation or link fit the data best. This function
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Kim, Ok-Yeon; Kim, Hye-Mi; Lee, Myong-In; Min, Young-Mi
2017-01-01
This study aims at predicting the seasonal number of typhoons (TY) over the western North Pacific with an Asia-Pacific Climate Center (APCC) multi-model ensemble (MME)-based dynamical-statistical hybrid model. The hybrid model uses the statistical relationship between the number of TY during the typhoon season (July-October) and the large-scale key predictors forecasted by APCC MME for the same season. The cross validation result from the MME hybrid model demonstrates high prediction skill, with a correlation of 0.67 between the hindcasts and observation for 1982-2008. The cross validation from the hybrid model with individual models participating in MME indicates that there is no single model which consistently outperforms the other models in predicting typhoon number. Although the forecast skill of MME is not always the highest compared to that of each individual model, the skill of MME presents rather higher averaged correlations and small variance of correlations. Given large set of ensemble members from multi-models, a relative operating characteristic score reveals an 82 % (above-) and 78 % (below-normal) improvement for the probabilistic prediction of the number of TY. It implies that there is 82 % (78 %) probability that the forecasts can successfully discriminate between above normal (below-normal) from other years. The forecast skill of the hybrid model for the past 7 years (2002-2008) is more skillful than the forecast from the Tropical Storm Risk consortium. Using large set of ensemble members from multi-models, the APCC MME could provide useful deterministic and probabilistic seasonal typhoon forecasts to the end-users in particular, the residents of tropical cyclone-prone areas in the Asia-Pacific region.
Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.
Gramatica, Paola; Giani, Elisa; Papa, Ester
2007-03-01
The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.
Water quality management using statistical analysis and time-series prediction model
Parmar, Kulwinder Singh; Bhardwaj, Rashmi
2014-12-01
This paper deals with water quality management using statistical analysis and time-series prediction model. The monthly variation of water quality standards has been used to compare statistical mean, median, mode, standard deviation, kurtosis, skewness, coefficient of variation at Yamuna River. Model validated using R-squared, root mean square error, mean absolute percentage error, maximum absolute percentage error, mean absolute error, maximum absolute error, normalized Bayesian information criterion, Ljung-Box analysis, predicted value and confidence limits. Using auto regressive integrated moving average model, future water quality parameters values have been estimated. It is observed that predictive model is useful at 95 % confidence limits and curve is platykurtic for potential of hydrogen (pH), free ammonia, total Kjeldahl nitrogen, dissolved oxygen, water temperature (WT); leptokurtic for chemical oxygen demand, biochemical oxygen demand. Also, it is observed that predicted series is close to the original series which provides a perfect fit. All parameters except pH and WT cross the prescribed limits of the World Health Organization /United States Environmental Protection Agency, and thus water is not fit for drinking, agriculture and industrial use.
Goetz, J. N.; Brenning, A.; Petschko, H.; Leopold, P.
2015-08-01
Statistical and now machine learning prediction methods have been gaining popularity in the field of landslide susceptibility modeling. Particularly, these data driven approaches show promise when tackling the challenge of mapping landslide prone areas for large regions, which may not have sufficient geotechnical data to conduct physically-based methods. Currently, there is no best method for empirical susceptibility modeling. Therefore, this study presents a comparison of traditional statistical and novel machine learning models applied for regional scale landslide susceptibility modeling. These methods were evaluated by spatial k-fold cross-validation estimation of the predictive performance, assessment of variable importance for gaining insights into model behavior and by the appearance of the prediction (i.e. susceptibility) map. The modeling techniques applied were logistic regression (GLM), generalized additive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant analysis (BPLDA). These modeling methods were tested for three areas in the province of Lower Austria, Austria. The areas are characterized by different geological and morphological settings. Random forest and bundling classification techniques had the overall best predictive performances. However, the performances of all modeling techniques were for the majority not significantly different from each other; depending on the areas of interest, the overall median estimated area under the receiver operating characteristic curve (AUROC) differences ranged from 2.9 to 8.9 percentage points. The overall median estimated true positive rate (TPR) measured at a 10% false positive rate (FPR) differences ranged from 11 to 15pp. The relative importance of each predictor was generally different between the modeling methods. However, slope angle, surface roughness and plan
An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models
Zaitchik, B. F.; Berhane, F.; Tadesse, T.
2015-12-01
We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS
von Ruette, Jonas; Papritz, Andreas; Lehmann, Peter; Rickli, Christian; Or, Dani
2011-10-01
Statistical models that exploit the correlation between landslide occurrence and geomorphic properties are often used to map the spatial occurrence of shallow landslides triggered by heavy rainfalls. In many landslide susceptibility studies, the true predictive power of the statistical model remains unknown because the predictions are not validated with independent data from other events or areas. This study validates statistical susceptibility predictions with independent test data. The spatial incidence of landslides, triggered by an extreme rainfall in a study area, was modeled by logistic regression. The fitted model was then used to generate susceptibility maps for another three study areas, for which event-based landslide inventories were also available. All the study areas lie in the northern foothills of the Swiss Alps. The landslides had been triggered by heavy rainfall either in 2002 or 2005. The validation was designed such that the first validation study area shared the geomorphology and the second the triggering rainfall event with the calibration study area. For the third validation study area, both geomorphology and rainfall were different. All explanatory variables were extracted for the logistic regression analysis from high-resolution digital elevation and surface models (2.5 m grid). The model fitted to the calibration data comprised four explanatory variables: (i) slope angle (effect of gravitational driving forces), (ii) vegetation type (grassland and forest; root reinforcement), (iii) planform curvature (convergent water flow paths), and (iv) contributing area (potential supply of water). The area under the Receiver Operating Characteristic (ROC) curve ( AUC) was used to quantify the predictive performance of the logistic regression model. The AUC values were computed for the susceptibility maps of the three validation study areas (validation AUC), the fitted susceptibility map of the calibration study area (apparent AUC: 0.80) and another
Reuter, M; Netter, P
2001-01-01
The present study proposes a hierarchical multivariate statistical prediction model which enables to determine the most prominent variables (physiological, biochemical and personality factors) related to nicotine craving and dopaminergic activation. Based on animal studies reporting a reduction of the rewarding effects of psychotropic drugs after blockade or destruction of the mesolimbic dopamine (DA) system, changes in nicotine craving after pharmacological manipulation by means of a DA agonist (lisuride 0.2 mg) and a DA antagonist (fluphenazine 2 mg) were assessed in 36 healthy male heavy smokers. The major aim was the development of a multivariate prediction model which is applicable in samples lacking variance homogeneity or the prerequisite of a multivariate normal distribution. The model proposed is a combination of multivariate parametric and nonparametric methods taking advantage of their individual merits. Especially personality variables, such as sensation seeking, impulsivity, and neuroticism showed to be important predictors of craving in this responder approach.
Theoretical and Statistical Models for Predicting Flux in Direct Contact Membrane Distillation
Directory of Open Access Journals (Sweden)
Atia E. Khalifa
2014-06-01
Full Text Available Theoretical modelhas been applied to predict the performance of Direct Contact Membrane Distillation (DCMD based on the analysis of heat and mass transfer through the membrane. The performance of DCMD on the account of different operating parameters had been predicted. Feed inlet temperature, coolant inlet temperature, feed flow rate and coolant flow rate are the considered performance variables. Based on the data obtained from theoretical model, statistical analysis of variance (ANOVA was then performed to determine the significant effect of each operating factors on the DCMD system performance. A new regression model was subsequently developed for predicting the performance of the DCMD system. Resultsrevealed that both theoretical and regression models were in good agreement with each other and also with the selected experimental data used for validation. The maximum percentage error between the two models was found to be1.098%. Hence, the developed regression model is adequate for predict the performance of DCMD system within the domain of the considered analysis.
Kumar, Ramya; Lahann, Joerg
2016-07-06
The performance of polymer interfaces in biology is governed by a wide spectrum of interfacial properties. With the ultimate goal of identifying design parameters for stem cell culture coatings, we developed a statistical model that describes the dependence of brush properties on surface-initiated polymerization (SIP) parameters. Employing a design of experiments (DOE) approach, we identified operating boundaries within which four gel architecture regimes can be realized, including a new regime of associated brushes in thin films. Our statistical model can accurately predict the brush thickness and the degree of intermolecular association of poly[{2-(methacryloyloxy) ethyl} dimethyl-(3-sulfopropyl) ammonium hydroxide] (PMEDSAH), a previously reported synthetic substrate for feeder-free and xeno-free culture of human embryonic stem cells. DOE-based multifunctional predictions offer a powerful quantitative framework for designing polymer interfaces. For example, model predictions can be used to decrease the critical thickness at which the wettability transition occurs by simply increasing the catalyst quantity from 1 to 3 mol %.
A statistical model to predict streamwise turbulent dispersion from the wall at small times
Nguyen, Quoc; Papavassiliou, Dimitrios V.
2016-12-01
Data from simulations are used to develop a statistical model that can provide the streamwise dispersion distribution of passive particles released from the wall of a turbulent flow channel. It is found that a three-point gamma probability density function is the statistical distribution that can describe the dispersion of particles with Schmidt numbers ranging from 6 to 2400 at relatively short times after the release of the particles. Scaling arguments are used to physically justify and predict the parameters of the gamma three-point distribution. The model is used to predict particle separation that can occur in turbulent flow under special conditions. Close to the channel wall, turbulent convection is not the dominant transport mechanism, but molecular diffusion can dominate transport depending on the Schmidt number of the particles. This leads to turbulence-induced separation rather than mixing, and the currently proposed model can be used to predict the level of separation. Practically, these results can be applied for separating very small particles or even macromolecules in dilute suspensions.
Directory of Open Access Journals (Sweden)
Jorge E Salamanca Céspedes
2015-02-01
Full Text Available Today has shown that modern traffic in data networks is highly correlated, making it necessary to select this kind of models that capture autocorrelation characteristics governing data flows surrounding on the network [1]. Being able to perform accurate forecasting of traffic on communication networks, this has great importance at present, since it influences decisions as important such as network sizing and predestination. The main purpose in this paper is to put into context the reader about the importance of statistical models of time series, it enable for estimating future traffic forecasts in modern communications networks, and becomes an essential tool for traffic prediction, This prediction according to the individual needs of each network are listed in estimates with long range dependence (LDR and short-range dependence (SDR, each one providing a specific control, appropriate and efficient integrated at different levels of the network functional hierarchy [2]. But for the traffic forecasts in the modern communication networks must define the type of network to study and time series model that fits the same, which is why you should first select the type of network. For this case study, is a Wi-Fi network as the traffic behavior requires the development of a time series model with advanced statistics, that allows an integrated observing network and thus provide a tool to facilitate the monitoring and management of the same. According to this the type of time series model to use for this case are the ARIMA time series.
Statistical Model for Prediction of Diabetic Foot Disease in Type 2 Diabetic Patients
Directory of Open Access Journals (Sweden)
Raúl López Fernández
2016-02-01
Full Text Available Background: the need to predict and study diabetic foot problems is a critical issue and represents a major medical challenge. The reduction of its incidence can lead to positive results for improving the quality of life of patients and the impact on the socio-economic sphere, due to the high prevalence of diabetes in the working population. Objective: to design a statistical model for prediction of diabetic foot disease in type 2 diabetic patients. Methods: a descriptive study was conducted in patients attending the Diabetes Clinic in Cienfuegos from 2010 to 2013. Significant risk factors for diabetic foot disease were analyzed as variables. To design the model, binary logistic regression analysis and Chi-squared automatic interaction detection decision tree were used. Results: two models that behaved similarly based on the comparison criteria considered (percentage of correct classification, sensitivity and specificity were developed. Validation was established through the receiver operating characteristic curve. The model using Chi-squared automatic interaction detection showed the best predictive results. Conclusions: Chi-squared automatic interaction detection decision trees have an adequate predictive capacity, which can be used in the Diabetes Clinic of Cienfuegos municipality.
Wind gust estimation by combining numerical weather prediction model and statistical post-processing
Patlakas, Platon; Drakaki, Eleni; Galanis, George; Spyrou, Christos; Kallos, George
2017-04-01
The continuous rise of off-shore and near-shore activities as well as the development of structures, such as wind farms and various offshore platforms, requires the employment of state-of-the-art risk assessment techniques. Such analysis is used to set the safety standards and can be characterized as a climatologically oriented approach. Nevertheless, a reliable operational support is also needed in order to minimize cost drawbacks and human danger during the construction and the functioning stage as well as during maintenance activities. One of the most important parameters for this kind of analysis is the wind speed intensity and variability. A critical measure associated with this variability is the presence and magnitude of wind gusts as estimated in the reference level of 10m. The latter can be attributed to different processes that vary among boundary-layer turbulence, convection activities, mountain waves and wake phenomena. The purpose of this work is the development of a wind gust forecasting methodology combining a Numerical Weather Prediction model and a dynamical statistical tool based on Kalman filtering. To this end, the parameterization of Wind Gust Estimate method was implemented to function within the framework of the atmospheric model SKIRON/Dust. The new modeling tool combines the atmospheric model with a statistical local adaptation methodology based on Kalman filters. This has been tested over the offshore west coastline of the United States. The main purpose is to provide a useful tool for wind analysis and prediction and applications related to offshore wind energy (power prediction, operation and maintenance). The results have been evaluated by using observational data from the NOAA's buoy network. As it was found, the predicted output shows a good behavior that is further improved after the local adjustment post-process.
DEFF Research Database (Denmark)
Rosthøj, Susanne; Keiding, Niels
2004-01-01
When studying a regression model measures of explained variation are used to assess the degree to which the covariates determine the outcome of interest. Measures of predictive accuracy are used to assess the accuracy of the predictions based on the covariates and the regression model. We give...... a detailed and general introduction to the two measures and the estimation procedures. The framework we set up allows for a study of the effect of misspecification on the quantities estimated. We also introduce a generalization to survival analysis....
Development of a Dynamics-Based Statistical Prediction Model for the Changma Onset
Park, H. L.; Seo, K. H.; Son, J. H.
2015-12-01
The timing of the changma onset has high impacts on the Korean Peninsula, yet its seasonal prediction remains a great challenge because the changma undergoes various influences from the tropics, subtropics, and midlatitudes. In this study, a dynamics-based statistical prediction model for the changma onset is proposed. This model utilizes three predictors of slowly varying sea surface temperature anomalies (SSTAs) over the northern tropical central Pacific, the North Atlantic, and the North Pacific occurring in the preceding spring season. SSTAs associated with each predictor persist until June and have an effect on the changma onset by inducing an anticyclonic anomaly to the southeast of the Korean Peninsula earlier than the climatological changma onset date. The persisting negative SSTAs over the northern tropical central Pacific and accompanying anomalous trade winds induce enhanced convection over the far-western tropical Pacific; in turn, these induce a cyclonic anomaly over the South China Sea and an anticyclonic anomaly southeast of the Korean Peninsula. Diabatic heating and cooling tendency related to the North Atlantic dipolar SSTAs induces downstream Rossby wave propagation in the upper troposphere, developing a barotropic anticyclonic anomaly to the south of the Korean Peninsula. A westerly wind anomaly at around 45°N resulting from the developing positive SSTAs over the North Pacific directly reduces the strength of the Okhotsk high and gives rise to an anticyclonic anomaly southeast of the Korean Peninsula. With the dynamics-based statistical prediction model, it is demonstrated that the changma onset has considerable predictability of r = 0.73 for the period from 1982 to 2014.
Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions
Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu
2014-04-01
Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics.
Development of statistical prediction models for Changma precipitation: An ensemble approach
Kim, Jin-Yong; Seo, Kyong-Hwan; Son, Jun-Hyeok; Ha, Kyung-Ja
2017-05-01
An ensemble statistical forecast scheme with a one-month lead is developed to predict year-to-year variations of Changma rainfall over the Korean peninsula. Spring sea surface temperature (SST) anomalies over the North Atlantic, the North Pacific and the tropical Pacific Ocean have been proposed as useful predictors in a previous study. Through a forward-stepwise regression method, four additional springtime predictors are selected: the northern Indian Ocean (NIO) SST, the North Atlantic SST change (NAC), the snow cover anomaly over the Eurasian continent (EUSC), and the western North Pacific outgoing longwave radiation anomaly (WNP (OLR)). Using these, three new prediction models are developed. A simple arithmetic ensemble mean produces much improved forecast skills compared to the original prediction model of Lee and Seo (2013). Skill scores measured by temporal correlation and MSSS (mean square error skill score) are improved by about 9% and 17%, respectively. The GMSS (Gerrity skill score) and hit rate based on a tercile prediction validation scheme are also enhanced by about 19% and 13%, respectively. The reversed NIO, reversed WNP (OLR), and reversed NAC are all related to the enhancement of a cyclonic circulation anomaly to the south or southwest of the Korean peninsula, which induces southeasterly moisture flux into the peninsula and increasing Changma precipitation. The EUSC predictor induces an enhancement of the Okhotsk Sea high downstream and thus strengthening of Changma front.
Prediction of climate change in Brunei Darussalam using statistical downscaling model
Hasan, Dk. Siti Nurul Ain binti Pg. Ali; Ratnayake, Uditha; Shams, Shahriar; Nayan, Zuliana Binti Hj; Rahman, Ena Kartina Abdul
2017-06-01
Climate is changing and evidence suggests that the impact of climate change would influence our everyday lives, including agriculture, built environment, energy management, food security and water resources. Brunei Darussalam located within the heart of Borneo will be affected both in terms of precipitation and temperature. Therefore, it is crucial to comprehend and assess how important climate indicators like temperature and precipitation are expected to vary in the future in order to minimise its impact. This study assesses the application of a statistical downscaling model (SDSM) for downscaling General Circulation Model (GCM) results for maximum and minimum temperatures along with precipitation in Brunei Darussalam. It investigates future climate changes based on numerous scenarios using Hadley Centre Coupled Model, version 3 (HadCM3), Canadian Earth System Model (CanESM2) and third-generation Coupled Global Climate Model (CGCM3) outputs. The SDSM outputs were improved with the implementation of bias correction and also using a monthly sub-model instead of an annual sub-model. The outcomes of this assessment show that monthly sub-model performed better than the annual sub-model. This study indicates a satisfactory applicability for generation of maximum temperatures, minimum temperatures and precipitation for future periods of 2017-2046 and 2047-2076. All considered models and the scenarios were consistent in predicting increasing trend of maximum temperature, increasing trend of minimum temperature and decreasing trend of precipitations. Maximum overall trend of Tmax was also observed for CanESM2 with Representative Concentration Pathways (RCP) 8.5 scenario. The increasing trend is 0.014 °C per year. Accordingly, by 2076, the highest prediction of average maximum temperatures is that it will increase by 1.4 °C. The same model predicts an increasing trend of Tmin of 0.004 °C per year, while the highest trend is seen under CGCM3-A2 scenario which is 0.009
Prediction of lacking control power in power plants using statistical models
DEFF Research Database (Denmark)
Odgaard, Peter Fogh; Mataji, B.; Stoustrup, Jakob
2007-01-01
errors; the second uses operating point depending statistics of prediction errors. Using these methods on the previous mentioned case, it can be concluded that the second method can be used to predict the power plant performance, while the first method has problems predicting the uncertain performance...
Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.
2013-10-01
Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.
A mid-infrared statistical investigation of clumpy torus model predictions
García-González, J.; Alonso-Herrero, A.; Hönig, S. F.; Hernán-Caballero, A.; Ramos Almeida, C.; Levenson, N. A.; Roche, P. F.; González-Martín, O.; Packham, C.; Kishimoto, M.
2017-09-01
We present new calculations of the Clumpy AGN Tori in a 3D geometry (CAT3D) clumpy torus models, which now include a more physical dust sublimation model as well as active galactic nucleus (AGN) anisotropic emission. These new models allow graphite grains to persist at temperatures higher than the silicate dust sublimation temperature. This produces stronger near-infrared emission and bluer mid-infrared (MIR) spectral slopes. We make a statistical comparison of the CAT3D model MIR predictions with a compilation of sub-arcsecond resolution ground-based MIR spectroscopy of 52 nearby Seyfert galaxies (median distance of 36 Mpc) and 10 quasars. We focus on the AGN MIR spectral index αMIR and the strength of the 9.7 μm silicate feature SSil. As with other clumpy torus models, the new CAT3D models do not reproduce the Seyfert galaxies with deep silicate absorption (SSil low photon escape probabilities, while the quasars and the Seyfert 1-1.5 require generally models with higher photon escape probabilities. Quasars and Seyfert 1-1.5 tend to show steeper radial cloud distributions and fewer clouds along an equatorial line of sight than Seyfert 2. Introducing AGN anisotropic emission besides the more physical dust sublimation models alleviates the problem of requiring inverted radial cloud distributions (i.e. more clouds towards the outer parts of the torus) to explain the MIR spectral indices of type 2 Seyferts.
A Statistical Cyclone Intensity Prediction (SCIP) model for the Bay of Bengal
Indian Academy of Sciences (India)
S D Kotal; S K Roy Bhowmik; P K Kundu; Ananda Kumar Das
2008-04-01
A statistical model for predicting the intensity of tropical cyclones in the Bay of Bengal has been proposed. The model is developed applying multiple linear regression technique. The model parameters are determined from the database of 62 cyclones that developed over the Bay of Bengal during the period 1981–2000. The parameters selected as predictors are: initial storm intensity, intensity changes during past 12 hours, storm motion speed, initial storm latitude position, vertical wind shear averaged along the storm track, vorticity at 850 hPa, Divergence at 200 hPa and sea surface temperature (SST). When the model is tested with the dependent samples of 62 cyclones, the forecast skill of the model for forecasts up to 72 hours is found to be reasonably good. The average absolute errors (AAE) are less than 10 knots for forecasts up to 36 hours and maximum forecast error of order 14 knots occurs at 60 hours and 72 hours. When the model is tested with the independent samples of 15 cyclones (during 2000 to 2007), the AAE is found to be less than 13 knots (ranging from 5.1 to 12.5 knots) for forecast up to 72 hours. The model is found to be superior to the empirical model proposed by Roy Bhowmik et al (2007) for the Bay of Bengal.
Predictive data-derived Bayesian statistic-transport model and simulator of sunken oil mass
Echavarria Gregory, Maria Angelica
Sunken oil is difficult to locate because remote sensing techniques cannot as yet provide views of sunken oil over large areas. Moreover, the oil may re-suspend and sink with changes in salinity, sediment load, and temperature, making deterministic fate models difficult to deploy and calibrate when even the presence of sunken oil is difficult to assess. For these reasons, together with the expense of field data collection, there is a need for a statistical technique integrating limited data collection with stochastic transport modeling. Predictive Bayesian modeling techniques have been developed and demonstrated for exploiting limited information for decision support in many other applications. These techniques brought to a multi-modal Lagrangian modeling framework, representing a near-real time approach to locating and tracking sunken oil driven by intrinsic physical properties of field data collected following a spill after oil has begun collecting on a relatively flat bay bottom. Methods include (1) development of the conceptual predictive Bayesian model and multi-modal Gaussian computational approach based on theory and literature review; (2) development of an object-oriented programming and combinatorial structure capable of managing data, integration and computation over an uncertain and highly dimensional parameter space; (3) creating a new bi-dimensional approach of the method of images to account for curved shoreline boundaries; (4) confirmation of model capability for locating sunken oil patches using available (partial) real field data and capability for temporal projections near curved boundaries using simulated field data; and (5) development of a stand-alone open-source computer application with graphical user interface capable of calibrating instantaneous oil spill scenarios, obtaining sets maps of relative probability profiles at different prediction times and user-selected geographic areas and resolution, and capable of performing post
A statistical model for water quality predictions from a river discharge using coastal observations
Kim, S.; Terrill, E. J.
2007-12-01
Understanding and predicting coastal ocean water quality has benefits for reducing human health risks, protecting the environment, and improving local economies which depend on clean beaches. Continuous observations of coastal physical oceanography increase the understanding of the processes which control the fate and transport of a riverine plume which potentially contains high levels of contaminants from the upstream watershed. A data-driven model of the fate and transport of river plume water from the Tijuana River has been developed using surface current observations provided by a network of HF radar operated as part of a local coastal observatory that has been in place since 2002. The model outputs are compared with water quality sampling of shoreline indicator bacteria, and the skill of an alarm for low water quality is evaluated using the receiver operating characteristic (ROC) curve. In addition, statistical analysis of beach closures in comparison with environmental variables is also discussed.
Manning, Robert M.
1990-01-01
A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.
Jones-Farrand, D. Todd; Fearer, Todd M.; Thogmartin, Wayne E.; Thompson, Frank R.; Nelson, Mark D.; Tirpak, John M.
2011-01-01
Selection of a modeling approach is an important step in the conservation planning process, but little guidance is available. We compared two statistical and three theoretical habitat modeling approaches representing those currently being used for avian conservation planning at landscape and regional scales: hierarchical spatial count (HSC), classification and regression tree (CRT), habitat suitability index (HSI), forest structure database (FS), and habitat association database (HA). We focused our comparison on models for five priority forest-breeding species in the Central Hardwoods Bird Conservation Region: Acadian Flycatcher, Cerulean Warbler, Prairie Warbler, Red-headed Woodpecker, and Worm-eating Warbler. Lacking complete knowledge on the distribution and abundance of each species with which we could illuminate differences between approaches and provide strong grounds for recommending one approach over another, we used two approaches to compare models: rank correlations among model outputs and comparison of spatial correspondence. In general, rank correlations were significantly positive among models for each species, indicating general agreement among the models. Worm-eating Warblers had the highest pairwise correlations, all of which were significant (P , 0.05). Red-headed Woodpeckers had the lowest agreement among models, suggesting greater uncertainty in the relative conservation value of areas within the region. We assessed model uncertainty by mapping the spatial congruence in priorities (i.e., top ranks) resulting from each model for each species and calculating the coefficient of variation across model ranks for each location. This allowed identification of areas more likely to be good targets of conservation effort for a species, those areas that were least likely, and those in between where uncertainty is higher and thus conservation action incorporates more risk. Based on our results, models developed independently for the same purpose
Ratner, Bruce
2011-01-01
The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has
Climate Prediction through Statistical Methods
Akgun, Bora; Tuter, Levent; Kurnaz, Mehmet Levent
2008-01-01
Climate change is a reality of today. Paleoclimatic proxies and climate predictions based on coupled atmosphere-ocean general circulation models provide us with temperature data. Using Detrended Fluctuation Analysis, we are investigating the statistical connection between the climate types of the present and these local temperatures. We are relating this issue to some well-known historic climate shifts. Our main result is that the temperature fluctuations with or without a temperature scale attached to them, can be used to classify climates in the absence of other indicators such as pan evaporation and precipitation.
Lua, Yuan J.; Liu, Wing K.; Belytschko, Ted
1992-01-01
A stochastic damage model for predicting the rupture of a brittle multiphase material is developed, based on the microcrack-macrocrack interaction. The model, which incorporates uncertainties in locations, orientations, and numbers of microcracks, characterizes damage by microcracking and fracture by macrocracking. A parametric study is carried out to investigate the change of the stress intensity at the macrocrack tip by the configuration of microcracks. The inherent statistical distribution of the fracture toughness arising from the intrinsic random nature of microcracks is explored using a statistical approach. For this purpose, a computer simulation model is introduced, which incorporates a statistical characterization of geometrical parameters of a random microcrack array.
Chen, Ting; Li, Liqing; Huang, Xiubao
2005-06-01
Physical, statistical and artificial neural network (ANN) models are established for predicting the fibre diameter of melt blown nonwovens from the processing parameters. The results show that the ANN model yields a very accurate prediction (average error of 0.013%), and a reasonably good ANN model can be achieved with relatively few data points. Because the physical model is based on the inherent physical principles of the phenomena of interest, it can yield reasonably good prediction results when experimental data are not available and the entire physical procedure is of interest. This area of research has great potential in the field of computer assisted design in melt blowing technology.
Theoretical study of statistical fractal model with applications to mineral resource prediction
Wei, Shen; Pengda, Zhao
2002-04-01
The statistical estimation of fractal dimensions is an important topic of investigation. Current solutions emphsize visual straight-line fitting, but nonlinear statistical modeling has the potential of making valuable contributions in this field. In this paper, we present the concepts of generalized fractal models and generalized fractal dimension and conclude that many geological models are special cases of the generalized models. We show that the power-function distribution possesses the fractal property of scaling invariance under upper truncation, which may help in lead statistical fractal modeling. A new method is developed on the basis of nonlinear regression to estimate fractal parameters. This method has advantages with respect to the traditional method based on linear regression for estimating the fractal dimension. Finally, the new method is illustrated by means of application to a real data set.
Vathsala, H.; Koolagudi, Shashidhar G.
2017-01-01
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
Multivariate Statistical Models for Predicting Sediment Yields from Southern California Watersheds
Gartner, Joseph E.; Cannon, Susan H.; Helsel, Dennis R.; Bandurraga, Mark
2009-01-01
Debris-retention basins in Southern California are frequently used to protect communities and infrastructure from the hazards of flooding and debris flow. Empirical models that predict sediment yields are used to determine the size of the basins. Such models have been developed using analyses of records of the amount of material removed from debris retention basins, associated rainfall amounts, measures of watershed characteristics, and wildfire extent and history. In this study we used multiple linear regression methods to develop two updated empirical models to predict sediment yields for watersheds located in Southern California. The models are based on both new and existing measures of volume of sediment removed from debris retention basins, measures of watershed morphology, and characterization of burn severity distributions for watersheds located in Ventura, Los Angeles, and San Bernardino Counties. The first model presented reflects conditions in watersheds located throughout the Transverse Ranges of Southern California and is based on volumes of sediment measured following single storm events with known rainfall conditions. The second model presented is specific to conditions in Ventura County watersheds and was developed using volumes of sediment measured following multiple storm events. To relate sediment volumes to triggering storm rainfall, a rainfall threshold was developed to identify storms likely to have caused sediment deposition. A measured volume of sediment deposited by numerous storms was parsed among the threshold-exceeding storms based on relative storm rainfall totals. The predictive strength of the two models developed here, and of previously-published models, was evaluated using a test dataset consisting of 65 volumes of sediment yields measured in Southern California. The evaluation indicated that the model developed using information from single storm events in the Transverse Ranges best predicted sediment yields for watersheds in San
Energy Technology Data Exchange (ETDEWEB)
Han, Kyung Hwa; Choi, Byoung Wook [Dept. of Radiology, and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul (Korea, Republic of); Song, Ki Jun [Dept. of Biostatistics and Medical Informatics, Yonsei University College of Medicine, Seoul (Korea, Republic of)
2016-06-15
Clinical prediction models are developed to calculate estimates of the probability of the presence/occurrence or future course of a particular prognostic or diagnostic outcome from multiple clinical or non-clinical parameters. Radiologic imaging techniques are being developed for accurate detection and early diagnosis of disease, which will eventually affect patient outcomes. Hence, results obtained by radiological means, especially diagnostic imaging, are frequently incorporated into a clinical prediction model as important predictive parameters, and the performance of the prediction model may improve in both diagnostic and prognostic settings. This article explains in a conceptual manner the overall process of developing and validating a clinical prediction model involving radiological parameters in relation to the study design and statistical methods. Collection of a raw dataset; selection of an appropriate statistical model; predictor selection; evaluation of model performance using a calibration plot, Hosmer-Lemeshow test and c-index; internal and external validation; comparison of different models using c-index, net reclassification improvement, and integrated discrimination improvement; and a method to create an easy-to-use prediction score system will be addressed. This article may serve as a practical methodological reference for clinical researchers.
DEFF Research Database (Denmark)
Devos, A.; Dombovic, M.; Bourgoignie, B.;
Summary: What?: CME geomagnetic forecast tool Context: integrated in COMESEP alert system (www.comesep.eu/alert) Input: positional and physcial parameters from detection algorithms CACTus, flaremail and SolarDemon Output: estimation of CME arrival, storm impact and duration How?: statistical model...
Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network
Yu, Ying; Wang, Yirui; Tang, Zheng
2017-01-01
With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient. PMID:28246527
Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.
Yu, Ying; Wang, Yirui; Gao, Shangce; Tang, Zheng
2017-01-01
With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.
DEFF Research Database (Denmark)
Hansen, Niels Christian; Loui, Psyche; Vuust, Peter
Statistical learning underlies the generation of expectations with different degrees of uncertainty. In music, uncertainty applies to expectations for pitches in a melody. This uncertainty can be quantified by Shannon entropy from distributions of expectedness ratings for multiple continuations...... of each melody, as obtained with the probe-tone paradigm. We hypothesised that statistical learning of music can be modelled as a process of entropy reduction. Specifically, implicit learning of statistical regularities allows reduction in the relative entropy (i.e. symmetrised Kullback-Leibler Divergence...... of musical training, and within-participant decreases in entropy after short-term statistical learning of novel music. Thus, whereas inexperienced listeners make high-entropy predictions, following the Principle of Maximum Entropy, statistical learning over varying timescales enables listeners to generate...
Xu, Yue; Xiang, Ping; Xie, Xiaopeng; Huang, Yang
2016-06-01
This paper presents a new modeling and simulation method to predict the important statistical performance of single photon avalanche diode (SPAD) detectors, including photon detection efficiency (PDE), dark count rate (DCR) and afterpulsing probability (AP). Three local electric field models are derived for the PDE, DCR and AP calculations, which show analytical dependence of key parameters such as avalanche triggering probability, impact ionization rate and electric field distributions that can be directly obtained from Geiger mode Technology Computer Aided Design (TCAD) simulation. The model calculation results are proven to be in good agreement with the reported experimental data in the open literature, suggesting that the proposed modeling and simulation method is very suitable for the prediction of SPAD statistical performance.
Directory of Open Access Journals (Sweden)
Kenny Xie
2014-01-01
Full Text Available Statistical models for preseason prediction of annual Atlantic tropical cyclone (TC and hurricane counts generally include El Niño/Southern Oscillation (ENSO forecasts as a predictor. As a result, the predictions from such models are often contaminated by the errors in ENSO forecasts. In this study, it is found that the latent heat flux (LHF over Eastern Tropical Pacific (ETP, defined as the region 0°–5°N, 115°–125°W in spring is negatively correlated with the annual Atlantic TC and hurricane counts. By using stepwise backward elimination regression, it is further shown that the March value of ETP LHF is a better predictor than the spring or summer ENSO index for Atlantic TC counts. Leave-one-out cross validation indicates that the annual Atlantic TC counts predicted by this ENSO-independent statistical model show a remarkable correlation with the actual TC counts (R=0.72; P value <0.01. For Atlantic hurricanes, the predictions using March ETP LHF and summer (July–September ENSO indices show only minor differences except in moderate to strong El Niño years. Thus, March ETP LHF is an excellent predictor for seasonal Atlantic TC prediction and a viable alternative to using ENSO index for Atlantic hurricane prediction.
An improved statistical model for predicting the deuterium ingress in zirconium alloy pressure tubes
Energy Technology Data Exchange (ETDEWEB)
Pandey, M.D., E-mail: mdpandey@uwaterloo.ca [Department of Civil and Environmental Engineering University of Waterloo, Waterloo, Ontario, N2L 3G1 (Canada); Xin, L. [Department of Civil and Environmental Engineering University of Waterloo, Waterloo, Ontario, N2L 3G1 (Canada)
2012-09-15
In the CANDU pressurized heavy water reactor (PHWR), the nuclear fuel is contained in hundreds of Zr-2.5 Nb alloy pressure tubes. The corrosion of zirconium alloy produces deuterium that is absorbed by the body of the pressure tube. The presence of this deuterium causes hydrogen embrittlement of zirconium alloy with an adverse effect on the integrity of the pressure tube. An accurate prediction of deuterium accumulation over time is an important step for ensuring the fitness-for-service of pressure tubes. Deuterium ingress data collected from in-service inspection of pressure tubes exhibit heteroscedasticity, i.e., the variance of deuterium concentration is dependent on operating time (or exposure) and temperature. The currently used model by the nuclear industry involves a logarithmic regression of deuterium content over time and temperature. Since this approach does not deal with heteroscedasticity precisely, it results in a conservative prediction of the deuterium ingress. The paper presents a new approach for predicting deuterium ingress based on a weighted least-squares (WLS) regression that overcomes the limitations of the existing model, and it provides realistic prediction bounds of deuterium ingress.
Predict! Teaching Statistics Using Informational Statistical Inference
Makar, Katie
2013-01-01
Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…
Energy Technology Data Exchange (ETDEWEB)
Granderson, Jessica [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.; Price, Phillip N. [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.
2014-03-01
This paper documents the development and application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-building energy savings. The methodology complements the principles addressed in resources such as ASHRAE Guideline 14 and the International Performance Measurement and Verification Protocol. It requires fitting a baseline model to data from a ``training period’’ and using the model to predict total electricity consumption during a subsequent ``prediction period.’’ We illustrate the methodology by evaluating five baseline models using data from 29 buildings. The training period and prediction period were varied, and model predictions of daily, weekly, and monthly energy consumption were compared to meter data to determine model accuracy. Several metrics were used to characterize the accuracy of the predictions, and in some cases the best-performing model as judged by one metric was not the best performer when judged by another metric.
Statistical Model for Content Extraction
DEFF Research Database (Denmark)
2011-01-01
We present a statistical model for content extraction from HTML documents. The model operates on Document Object Model (DOM) tree of the corresponding HTML document. It evaluates each tree node and associated statistical features to predict significance of the node towards overall content...
Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in V2
Directory of Open Access Journals (Sweden)
Gutmann Michael
2005-02-01
Full Text Available Abstract Background It has been shown that the classical receptive fields of simple and complex cells in the primary visual cortex emerge from the statistical properties of natural images by forcing the cell responses to be maximally sparse or independent. We investigate how to learn features beyond the primary visual cortex from the statistical properties of modelled complex-cell outputs. In previous work, we showed that a new model, non-negative sparse coding, led to the emergence of features which code for contours of a given spatial frequency band. Results We applied ordinary independent component analysis to modelled outputs of complex cells that span different frequency bands. The analysis led to the emergence of features which pool spatially coherent across-frequency activity in the modelled primary visual cortex. Thus, the statistically optimal way of processing complex-cell outputs abandons separate frequency channels, while preserving and even enhancing orientation tuning and spatial localization. As a technical aside, we found that the non-negativity constraint is not necessary: ordinary independent component analysis produces essentially the same results as our previous work. Conclusion We propose that the pooling that emerges allows the features to code for realistic low-level image features related to step edges. Further, the results prove the viability of statistical modelling of natural images as a framework that produces quantitative predictions of visual processing.
A statistical model to predict total column ozone in Peninsular Malaysia
Institute of Scientific and Technical Information of China (English)
K.C.TAN; H.S.LIM; M.Z.MAT JAFRI
2016-01-01
This study aims to predict monthly columnar ozone in Peninsular Malaysia based on concentrations of several atmospheric gases.Data pertaining to five atmospheric gases (CO2,O3,CH4,NO2,and H2O vapor) were retrieved by satellite scanning imaging absorption spectrometry for atmospheric chartography from 2003 to 2008 and used to develop a model to predict columnar ozone in Peninsular Malaysia.Analyses of the northeast monsoon (NEM) and the southwest monsoon (SWM) seasons were conducted separately.Based on the Pearson correlation matrices,columnar ozone was negatively correlated with H2O vapor but positively correlated with CO2 and NO2 during both the NEM and SWM seasons from 2003 to 2008.This result was expected because NO2 is a precursor of ozone.Therefore,an increase in columnar ozone concentration is associated with an increase in NO2 but a decrease in H2O vapor.In the NEM season,columnar ozone was negatively correlated with H2O (-0.847),NO2 (0.754),and CO2 (0.477);columnar ozone was also negatively but weakly correlated with CH4 (-0.035).In the SWM season,columnar ozone was highly positively correlated with NO2 (0.855),CO2 (0.572),and CH4 (0.321) and also highly negatively correlated with H2O (-0.832).Both multiple regression and principal component analyses were used to predict the columnar ozone value in Peninsular Malaysia.We obtained the best-fitting regression equations for the columnar ozone data using four independent variables.Our results show approximately the same R value (≈ 0.83) for both the NEM and SWM seasons.
Gallice, A.; Schaefli, B.; Lehning, M.; Parlange, M. B.; Huwald, H.
2015-09-01
The development of stream temperature regression models at regional scales has regained some popularity over the past years. These models are used to predict stream temperature in ungauged catchments to assess the impact of human activities or climate change on riverine fauna over large spatial areas. A comprehensive literature review presented in this study shows that the temperature metrics predicted by the majority of models correspond to yearly aggregates, such as the popular annual maximum weekly mean temperature (MWMT). As a consequence, current models are often unable to predict the annual cycle of stream temperature, nor can the majority of them forecast the inter-annual variation of stream temperature. This study presents a new statistical model to estimate the monthly mean stream temperature of ungauged rivers over multiple years in an Alpine country (Switzerland). Contrary to similar models developed to date, which are mostly based on standard regression approaches, this one attempts to incorporate physical aspects into its structure. It is based on the analytical solution to a simplified version of the energy-balance equation over an entire stream network. Some terms of this solution cannot be readily evaluated at the regional scale due to the lack of appropriate data, and are therefore approximated using classical statistical techniques. This physics-inspired approach presents some advantages: (1) the main model structure is directly obtained from first principles, (2) the spatial extent over which the predictor variables are averaged naturally arises during model development, and (3) most of the regression coefficients can be interpreted from a physical point of view - their values can therefore be constrained to remain within plausible bounds. The evaluation of the model over a new freely available data set shows that the monthly mean stream temperature curve can be reproduced with a root-mean-square error (RMSE) of ±1.3 °C, which is similar in
Ruz, J; Linares, P; Luque de Castro, M D; Caridad, J M; Valcarcel, M
1989-01-01
Blood, saliva and breath samples from a population of males and females subjected to the intake of preselected amounts of ethanol, whilst in different physical conditions (at rest, after physical exertion, on an empty stomach and after eating), were analysed by automatic methods employing immobilized (blood) or dissolved (saliva) enzymes and a breathanalyser. Treatment of the results obtained enabled the development of a statistical model for prediction of the ethanol concentration in blood at a given time from the ethanol concentration in saliva or breath obtained at a later time.
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Larsen, Gunner Chr.; Hansen, Kurt Schaldemose
2004-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....
Hoshyaripour, G.; Brasseur, G.; Andrade, M. F.; Gavidia-Calderón, M.; Bouarar, I.; Ynoue, R. Y.
2016-11-01
Two state-of-the-art models (deterministic: Weather Research and Forecast model with Chemistry (WRF-Chem) and statistic: Artificial Neural Networks: (ANN)) are implemented to predict the ground-level ozone concentration in São Paulo (SP), Brazil. Two domains are set up for WRF-Chem simulations: a coarse domain (with 50 km horizontal resolution) including whole South America (D1) and a nested domain (with horizontal resolution of 10 km) including South Eastern Brazil (D2). To evaluate the spatial distribution of the chemical species, model results are compared to the Measurements of Pollution in The Troposphere (MOPITT) data, showing that the model satisfactorily predicts the CO concentrations in both D1 and D2. The model also reproduces the measurements made at three air quality monitoring stations in SP with the correlation coefficients of 0.74, 0.70, and 0.77 for O3 and 0.51, 0.48, and 0.57 for NOx. The input selection for ANN model is carried out using Forward Selection (FS) method. FS-ANN is then trained and validated using the data from two air quality monitoring stations, showing correlation coefficients of 0.84 and 0.75 for daily mean and 0.64 and 0.67 for daily peak ozone during the test stage. Then, both WRF-Chem and FS-ANN are deployed to forecast the daily mean and peak concentrations of ozone in two stations during 5-20 August 2012. Results show that WRF-Chem preforms better in predicting mean and peak ozone concentrations as well as in conducting mechanistic and sensitivity analysis. FS-ANN is only advantageous in predicting mean daily ozone concentrations considering its significantly lower computational costs and ease of development and implementation, compared to that of WRF-Chem.
2013-09-30
for Extended Range Environmental Prediction Andrew J. Majda New York University Courant Institute of Mathematical Sciences 251 Mercer Street...UCLA), and Dimitri Giannakis ( Courant Institute) with their post docs and Ph.D. students: A. Reemergence Mechanisms for North Pacific Sea Ice...ES) New York University, Courant Institute of Mathematical Sciences,251 Mercer Street,New York,NY,10012 8. PERFORMING ORGANIZATION REPORT NUMBER 9
Weaver, Rhiannon
2008-01-01
Model validation in computational cognitive psychology often relies on methods drawn from the testing of theories in experimental physics. However, applications of these methods to computational models in typical cognitive experiments can hide multiple, plausible sources of variation arising from human participants and from stochastic cognitive…
Lindermeir, E.; Beier, K.
2012-08-01
The spectroscopic database HITEMP 2010 is used to upgrade the parameters of the statistical molecular band model which is part of the infrared signature prediction code NIRATAM (NATO InfraRed Air TArget Model). This band model was recommended by NASA and is applied in several codes that determine the infrared emission of combustion gases. The upgrade regards spectral absorption coefficients and line densities of the gases H2O, CO2, and CO in the spectral region 400-5000 cm-1 (2-25μm) with a spectral resolution of 5 cm-1. The temperature range 100-3000 K is covered. Two methods to update the database are presented: the usually applied method as provided in the literature and an alternative, more laborious procedure that employs least squares fitting. The achieved improvements resulting from both methods are demonstrated by comparisons of radiance spectra obtained from the band model to line-by-line results. The performance in a realistic scenario is investigated on the basis of measured and predicted spectra of a jet aircraft plume in afterburner mode.
Directory of Open Access Journals (Sweden)
Abhinandan Madenhalli
2007-08-01
Full Text Available Abstract Background Understanding and predicting protein stability upon point mutations has wide-spread importance in molecular biology. Several prediction models have been developed in the past with various algorithms. Statistical potentials are one of the widely used algorithms for the prediction of changes in stability upon point mutations. Although the methods provide flexibility and the capability to develop an accurate and reliable prediction model, it can be achieved only by the right selection of the structural factors and optimization of their parameters for the statistical potentials. In this work, we have selected five atom classification systems and compared their efficiency for the development of amino acid atom potentials. Additionally, torsion angle potentials have been optimized to include the orientation of amino acids in such a way that altered backbone conformation in different secondary structural regions can be included for the prediction model. This study also elaborates the importance of classifying the mutations according to their solvent accessibility and secondary structure specificity. The prediction efficiency has been calculated individually for the mutations in different secondary structural regions and compared. Results Results show that, in addition to using an advanced atom description, stepwise regression and selection of atoms are necessary to avoid the redundancy in atom distribution and improve the reliability of the prediction model validation. Comparing to other atom classification models, Melo-Feytmans model shows better prediction efficiency by giving a high correlation of 0.85 between experimental and theoretical ΔΔG with 84.06% of the mutations correctly predicted out of 1538 mutations. The theoretical ΔΔG values for the mutations in partially buried β-strands generated by the structural training dataset from PISCES gave a correlation of 0.84 without performing the Gaussian apodization of the
Physics Constrained Stochastic-Statistical Models for Extended Range Environmental Prediction
2014-09-30
Ocean through a low- dimensional family of spatiotemporal modes extracted from global circulation model (GCM) output and satellite observations using...analysis in [1] to cover the whole of the Arctic , and to include both ocean and atmosphere variables [sea surface temperature (SST) and sea level...818) 393-3379 email: waliser@ucla.edu Informal CO-P. I. Dimitrios Giannakis Department of Mathematics and Center for Atmosphere Ocean Science
Laepple, T; Laepple, Thomas; Jewson, Stephen
2007-01-01
There is a clear positive correlation between boreal summer tropical Atlantic sea-surface temperature and annual hurricane numbers. This motivates the idea of trying to predict the sea-surface temperature in order to be able to predict future hurricane activity. In previous work we have used simple statistical methods to make 5 year predictions of tropical Atlantic sea surface temperatures for this purpose. We now compare these statistical SST predictions with SST predictions made by an ensemble mean of IPCC climate models.
Statistical Model of Extreme Shear
DEFF Research Database (Denmark)
Hansen, Kurt Schaldemose; Larsen, Gunner Chr.
2005-01-01
In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...
Energy Technology Data Exchange (ETDEWEB)
Mishra, Srikanta; Schuetter, Jared
2014-11-01
We compare two approaches for building a statistical proxy model (metamodel) for CO₂ geologic sequestration from the results of full-physics compositional simulations. The first approach involves a classical Box-Behnken or Augmented Pairs experimental design with a quadratic polynomial response surface. The second approach used a space-filling maxmin Latin Hypercube sampling or maximum entropy design with the choice of five different meta-modeling techniques: quadratic polynomial, kriging with constant and quadratic trend terms, multivariate adaptive regression spline (MARS) and additivity and variance stabilization (AVAS). Simulations results for CO₂ injection into a reservoir-caprock system with 9 design variables (and 97 samples) were used to generate the data for developing the proxy models. The fitted models were validated with using an independent data set and a cross-validation approach for three different performance metrics: total storage efficiency, CO₂ plume radius and average reservoir pressure. The Box-Behnken–quadratic polynomial metamodel performed the best, followed closely by the maximin LHS–kriging metamodel.
Liu, Leping; Maddux, Cleborne D.
2008-01-01
This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…
Diffeomorphic Statistical Deformation Models
DEFF Research Database (Denmark)
Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus
2007-01-01
In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al. Th...... with ground truth in form of manual expert annotations, and compared to Cootes's model. We anticipate applications in unconstrained diffeomorphic synthesis of images, e.g. for tracking, segmentation, registration or classification purposes....
Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation
Directory of Open Access Journals (Sweden)
Aris Spanos
2011-01-01
Full Text Available Statistical model specification and validation raise crucial foundational problems whose pertinent resolution holds the key to learning from data by securing the reliability of frequentist inference. The paper questions the judiciousness of several current practices, including the theory-driven approach, and the Akaike-type model selection procedures, arguing that they often lead to unreliable inferences. This is primarily due to the fact that goodness-of-fit/prediction measures and other substantive and pragmatic criteria are of questionable value when the estimated model is statistically misspecified. Foisting one's favorite model on the data often yields estimated models which are both statistically and substantively misspecified, but one has no way to delineate between the two sources of error and apportion blame. The paper argues that the error statistical approach can address this Duhemian ambiguity by distinguishing between statistical and substantive premises and viewing empirical modeling in a piecemeal way with a view to delineate the various issues more effectively. It is also argued that Hendry's general to specific procedures does a much better job in model selection than the theory-driven and the Akaike-type procedures primary because of its error statistical underpinnings.
Statistical Basis for Predicting Technological Progress
Nagy, Bela; Bui, Quan M; Trancik, Jessika E
2012-01-01
Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserv...
Statistical prediction of Late Miocene climate
Digital Repository Service at National Institute of Oceanography (India)
Fernandes, A.A.; Gupta, S.M.
The theory of statistical prediction of paleoclimate (Imbrie and Kipp, 1971), which includes multiple regression analysis and factor analysis is reviewed. Necessary software is listed. An application to predicting palaeo oceanographic parameters...
Levy, R.; Mcginness, H.
1976-01-01
Investigations were performed to predict the power available from the wind at the Goldstone, California, antenna site complex. The background for power prediction was derived from a statistical evaluation of available wind speed data records at this location and at nearby locations similarly situated within the Mojave desert. In addition to a model for power prediction over relatively long periods of time, an interim simulation model that produces sample wind speeds is described. The interim model furnishes uncorrelated sample speeds at hourly intervals that reproduce the statistical wind distribution at Goldstone. A stochastic simulation model to provide speed samples representative of both the statistical speed distributions and correlations is also discussed.
Statistical modeling of program performance
Directory of Open Access Journals (Sweden)
A. P. Karpenko
2014-01-01
Full Text Available A task of evaluation of program performance often occurs in the process of design of computer systems or during iterative compilation. A traditional way to solve this problem is emulation of program execution on the target system. A modern alternative approach to evaluation of program performance is based on statistical modeling of program performance on a computer under investigation. This statistical method of modeling program performance called Velocitas was introduced in this work. The method and its implementation in the Adaptor framework were presented. Investigation of the method's effectiveness showed high adequacy of program performance prediction.
Diametral creep prediction of pressure tube using statistical regression methods
Energy Technology Data Exchange (ETDEWEB)
Kim, D. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of); Lee, J.Y. [Korea Electric Power Research Inst., Daejeon (Korea, Republic of); Na, M.G. [Chosun Univ., Gwangju (Korea, Republic of); Jang, C. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of)
2010-07-01
Diametral creep prediction of pressure tube in CANDU reactor is an important factor for ROPT calculation. In this study, pressure tube diametral creep prediction models were developed using statistical regression method such as linear mixed model for longitudinal data analysis. Inspection and operating condition data of Wolsong unit 1 and 2 reactors were used. Serial correlation model and random coefficient model were developed for pressure tube diameter prediction. Random coefficient model provided more accurate results than serial correlation model. (author)
Predicting radiotherapy outcomes using statistical learning techniques
Energy Technology Data Exchange (ETDEWEB)
El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O [Washington University, Saint Louis, MO (United States); Lindsay, Patricia E; Hope, Andrew J [Department of Radiation Oncology, Princess Margaret Hospital, Toronto, ON (Canada)
2009-09-21
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among
Predicting radiotherapy outcomes using statistical learning techniques*
El Naqa, Issam; Bradley, Jeffrey D; Lindsay, Patricia E; Hope, Andrew J; Deasy, Joseph O
2013-01-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for ‘generalizabilty’ validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model
Predicting radiotherapy outcomes using statistical learning techniques
El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.
2009-09-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model
Modeling cosmic void statistics
Hamaus, Nico; Sutter, P. M.; Wandelt, Benjamin D.
2016-10-01
Understanding the internal structure and spatial distribution of cosmic voids is crucial when considering them as probes of cosmology. We present recent advances in modeling void density- and velocity-profiles in real space, as well as void two-point statistics in redshift space, by examining voids identified via the watershed transform in state-of-the-art ΛCDM n-body simulations and mock galaxy catalogs. The simple and universal characteristics that emerge from these statistics indicate the self-similarity of large-scale structure and suggest cosmic voids to be among the most pristine objects to consider for future studies on the nature of dark energy, dark matter and modified gravity.
Predicting Success in Psychological Statistics Courses.
Lester, David
2016-06-01
Many students perform poorly in courses on psychological statistics, and it is useful to be able to predict which students will have difficulties. In a study of 93 undergraduates enrolled in Statistical Methods (18 men, 75 women; M age = 22.0 years, SD = 5.1), performance was significantly associated with sex (female students performed better) and proficiency in algebra in a linear regression analysis. Anxiety about statistics was not associated with course performance, indicating that basic mathematical skills are the best correlate for performance in statistics courses and can usefully be used to stream students into classes by ability.
Græsbøll, Kaare; Kirkeby, Carsten; Nielsen, Søren Saxmose; Halasa, Tariq; Toft, Nils; Christiansen, Lasse Engbo
2017-01-01
The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction of the future value of a dairy cow requires further detailed knowledge of the costs associated with feed, management practices, production systems, and disease. Here, we present a method to predict the future value of the milk production of a dairy cow based on herd recording data only. The method consists of several steps to evaluate lifetime milk production and individual cow somatic cell counts and to finally predict the average production for each day that the cow is alive. Herd recording data from 610 Danish Holstein herds were used to train and test a model predicting milk production (including factors associated with milk yield, somatic cell count, and the survival of individual cows). All estimated parameters were either herd- or cow-specific. The model prediction deviated, on average, less than 0.5 kg from the future average milk production of dairy cows in multiple herds after adjusting for the effect of somatic cell count. We conclude that estimates of future average production can be used on a day-to-day basis to rank cows for culling, or can be implemented in simulation models of within-herd disease spread to make operational decisions, such as culling versus treatment. An advantage of the approach presented in this paper is that it requires no specific knowledge of disease status or any other information beyond herd recorded milk yields, somatic cell counts, and reproductive status. PMID:28261585
Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung
2016-01-01
A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy.
Directory of Open Access Journals (Sweden)
Mónica F. Díaz
2012-12-01
Full Text Available Volatile organic compounds (VOCs are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log Pliver for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR models for the prediction of log Pliver, where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log Pliver models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.
Learning Predictive Statistics: Strategies and Brain Mechanisms.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-08-30
When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions.SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to
Statistical basis for predicting technological progress.
Directory of Open Access Journals (Sweden)
Béla Nagy
Full Text Available Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.
Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D
2017-01-01
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
DEFF Research Database (Denmark)
Breinholt, Anders; Møller, Jan Kloppenborg; Madsen, Henrik
2012-01-01
and GLUE advocators who consider errors as epistemic, arguing that the basis of formal statistical approaches that requires the residuals to be stationary and conform to a statistical distribution is unrealistic. In this paper we take a formal frequentist approach to parameter estimation and uncertainty...... evaluation of the modelled output, and we attach particular importance to inspecting the residuals of the model outputs and improving the model uncertainty description. We also introduce the probabilistic performance measures sharpness, reliability and interval skill score for model comparison...... on the SDE method and the skill scoring criterion proved that significant predictive improvements of the output can be gained from updating the states continuously. In an effort to attain residual stationarity for both the output error method and the SDE method transformation of the observations were...
Dėdelė, Audrius; Miškinytė, Auksė
2015-09-01
In many countries, road traffic is one of the main sources of air pollution associated with adverse effects on human health and environment. Nitrogen dioxide (NO2) is considered to be a measure of traffic-related air pollution, with concentrations tending to be higher near highways, along busy roads, and in the city centers, and the exceedances are mainly observed at measurement stations located close to traffic. In order to assess the air quality in the city and the air pollution impact on public health, air quality models are used. However, firstly, before the model can be used for these purposes, it is important to evaluate the accuracy of the dispersion modelling as one of the most widely used method. The monitoring and dispersion modelling are two components of air quality monitoring system (AQMS), in which statistical comparison was made in this research. The evaluation of the Atmospheric Dispersion Modelling System (ADMS-Urban) was made by comparing monthly modelled NO2 concentrations with the data of continuous air quality monitoring stations in Kaunas city. The statistical measures of model performance were calculated for annual and monthly concentrations of NO2 for each monitoring station site. The spatial analysis was made using geographic information systems (GIS). The calculation of statistical parameters indicated a good ADMS-Urban model performance for the prediction of NO2. The results of this study showed that the agreement of modelled values and observations was better for traffic monitoring stations compared to the background and residential stations.
Predicting Statistical Distributions of Footbridge Vibrations
DEFF Research Database (Denmark)
Pedersen, Lars; Frier, Christian
2009-01-01
The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables. The imp......The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Breinholt, Anders; Møller, Jan Kloppenborg; Madsen, Henrik; Mikkelsen, Peter Steen
2012-11-01
SummaryWhile there seems to be consensus that hydrological model outputs should be accompanied with an uncertainty estimate the appropriate method for uncertainty estimation is not agreed upon and a debate is ongoing between advocators of formal statistical methods who consider errors as stochastic and GLUE advocators who consider errors as epistemic, arguing that the basis of formal statistical approaches that requires the residuals to be stationary and conform to a statistical distribution is unrealistic. In this paper we take a formal frequentist approach to parameter estimation and uncertainty evaluation of the modelled output, and we attach particular importance to inspecting the residuals of the model outputs and improving the model uncertainty description. We also introduce the probabilistic performance measures sharpness, reliability and interval skill score for model comparison and for checking the reliability of the confidence bounds. Using point rainfall and evaporation data as input and flow measurements from a sewer system for model conditioning, a state space model is formulated that accounts for three different flow contributions: wastewater from households, and fast rainfall-runoff from paved areas and slow rainfall-dependent infiltration-inflow from unknown sources. We consider two different approaches to evaluate the model output uncertainty, the output error method that lumps all uncertainty into the observation noise term, and a method based on Stochastic Differential Equations (SDEs) that separates input and model structure uncertainty from observation uncertainty and allows updating of model states in real-time. The results show that the optimal simulation (off-line) model is based on the output error method whereas the optimal prediction (on-line) model is based on the SDE method and the skill scoring criterion proved that significant predictive improvements of the output can be gained from updating the states continuously. In an effort to
Institute of Scientific and Technical Information of China (English)
ZHU Congwen; Chung-Kyu PARK; Woo-Sung LEE; Won-Tae YUN
2008-01-01
The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific(Asia-Pacific)summer monsoon region(0°-50°N,100°-150°E)were evaluated in nine different AGCM,used in the Asia-Pacific Economic Cooperation Climate Center(APCCl multi.model ensemble seasonal prediction system.The analysis indicates that the precipitation anomaly patterns of model ensemble predictions are substantially difierent from the observed counterparts in this region.but the summer monsoon circulations are reasonably predicted.For example,all models can well produce the interannual variability of the western North Pacific monsoon index(WNPMI)defined by 850 hPa winds,but they failed to predict the relationship between WNPMI and precipitation anomalies.The interannual variability of the 500 hPa geopotential height(GPH)can be well predicted by the models in contrast to precipitation anomalies.On the basis of such model performances and the relationship between the interannual variations of 500 hPa GPH and precipitation anomalies.we developed a statistical scheme used to downscale the summer monsoon precipitation anomaly Oll the basis of EOF and singular value decomposition(SVD).In this scheme,the three leading EOF modes of 500 hPa GPH anomaly fields predicted by the models are firstly corrected by the linear regression between the principal components in each model and observation.respectively.Then. the corrected model GPH is chosen as the predictor to downscale the precipitation anomaly field,which is assembled by the forecasted expansion toeffcients of model 500 hPa GPH and the three leading SVD modes of observed precipitation anomaly corresponding to the prediction of model 500 hPa GPH during a 19-year training period.The cross-validated forecasts suggest that this dowuscaling scheme may have a potential to improve the forecast skill of the precipitation anomaly in the South China Sea,western North Pacific and the East Asia Pacific regions
Directory of Open Access Journals (Sweden)
Kathleen M O'Reilly
2011-10-01
Full Text Available Outbreaks of poliomyelitis in African countries that were previously free of wild-type poliovirus cost the Global Polio Eradication Initiative US$850 million during 2003-2009, and have limited the ability of the program to focus on endemic countries. A quantitative understanding of the factors that predict the distribution and timing of outbreaks will enable their prevention and facilitate the completion of global eradication.Children with poliomyelitis in Africa from 1 January 2003 to 31 December 2010 were identified through routine surveillance of cases of acute flaccid paralysis, and separate outbreaks associated with importation of wild-type poliovirus were defined using the genetic relatedness of these viruses in the VP1/2A region. Potential explanatory variables were examined for their association with the number, size, and duration of poliomyelitis outbreaks in 6-mo periods using multivariable regression analysis. The predictive ability of 6-mo-ahead forecasts of poliomyelitis outbreaks in each country based on the regression model was assessed. A total of 142 genetically distinct outbreaks of poliomyelitis were recorded in 25 African countries, resulting in 1-228 cases (median of two cases. The estimated number of people arriving from infected countries and <5-y childhood mortality were independently associated with the number of outbreaks. Immunisation coverage based on the reported vaccination history of children with non-polio acute flaccid paralysis was associated with the duration and size of each outbreak, as well as the number of outbreaks. Six-month-ahead forecasts of the number of outbreaks in a country or region changed over time and had a predictive ability of 82%.Outbreaks of poliomyelitis resulted primarily from continued transmission in Nigeria and the poor immunisation status of populations in neighbouring countries. From 1 January 2010 to 30 June 2011, reduced transmission in Nigeria and increased incidence in reinfected
Algebraic Statistics for Network Models
2014-02-19
AFRL-OSR-VA-TR-2014-0070 (DARPA) Algebraic Statistics for Network Models SONJA PETROVIC PENNSYLVANIA STATE UNIVERSITY 02/19/2014 Final Report...DARPA GRAPHS Phase I Algebraic Statistics for Network Models FA9550-12-1-0392 Sonja Petrović petrovic@psu.edu1 Department of Statistics Pennsylvania...Department of Statistics, Heinz College , Machine Learning Department, Cylab Carnegie Mellon University 1. Abstract This project focused on the family of
Statistical Modelling of Wind Proles - Data Analysis and Modelling
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre
The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles.......The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles....
Wildfire Prediction to Inform Fire Management: Statistical Science Challenges
Taylor, S. W.; Douglas G. Woolford; Dean, C. B.; Martell, David L.
2013-01-01
Wildfire is an important system process of the earth that occurs across a wide range of spatial and temporal scales. A variety of methods have been used to predict wildfire phenomena during the past century to better our understanding of fire processes and to inform fire and land management decision-making. Statistical methods have an important role in wildfire prediction due to the inherent stochastic nature of fire phenomena at all scales. ¶ Predictive models have exploited several so...
Methods of statistical model estimation
Hilbe, Joseph
2013-01-01
Methods of Statistical Model Estimation examines the most important and popular methods used to estimate parameters for statistical models and provide informative model summary statistics. Designed for R users, the book is also ideal for anyone wanting to better understand the algorithms used for statistical model fitting. The text presents algorithms for the estimation of a variety of regression procedures using maximum likelihood estimation, iteratively reweighted least squares regression, the EM algorithm, and MCMC sampling. Fully developed, working R code is constructed for each method. Th
Farmer, William H.; Knight, Rodney R.; Eash, David A.; Kasey J. Hutchinson,; Linhart, S. Mike; Christiansen, Daniel E.; Archfield, Stacey A.; Over, Thomas M.; Kiang, Julie E.
2015-08-24
Daily records of streamflow are essential to understanding hydrologic systems and managing the interactions between human and natural systems. Many watersheds and locations lack streamgages to provide accurate and reliable records of daily streamflow. In such ungaged watersheds, statistical tools and rainfall-runoff models are used to estimate daily streamflow. Previous work compared 19 different techniques for predicting daily streamflow records in the southeastern United States. Here, five of the better-performing methods are compared in a different hydroclimatic region of the United States, in Iowa. The methods fall into three classes: (1) drainage-area ratio methods, (2) nonlinear spatial interpolations using flow duration curves, and (3) mechanistic rainfall-runoff models. The first two classes are each applied with nearest-neighbor and map-correlated index streamgages. Using a threefold validation and robust rank-based evaluation, the methods are assessed for overall goodness of fit of the hydrograph of daily streamflow, the ability to reproduce a daily, no-fail storage-yield curve, and the ability to reproduce key streamflow statistics. As in the Southeast study, a nonlinear spatial interpolation of daily streamflow using flow duration curves is found to be a method with the best predictive accuracy. Comparisons with previous work in Iowa show that the accuracy of mechanistic models with at-site calibration is substantially degraded in the ungaged framework.
Directory of Open Access Journals (Sweden)
Gambo Anthony VICTOR
2017-06-01
Full Text Available A model to predict the dry sliding wear behaviour of Aluminium-Jute bast ash particulate composites produced by double stir-casting method was developed in terms of weight fraction of jute bast ash (JBA. Experiments were designed on the basis of the Design of Experiments (DOE technique. A 2k factorial, where k is the number of variables, with central composite second-order rotatable design was used to improve the reliability of results and to reduce the size of experimentation without loss of accuracy. The factors considered in this study were sliding velocity, sliding distance, normal load and mass fraction of JBA reinforcement in the matrix. The developed regression model was validated by statistical software MINITAB-R14 and statistical tool such as analysis of variance (ANOVA. It was found that the developed regression model could be effectively used to predict the wear rate at 95% confidence level. The wear rate of cast Al-JBAp composite decreased with an increase in the mass fraction of JBA and increased with an increase of the sliding velocity, sliding distance and normal load acting on the composite specimen.
LP Approach to Statistical Modeling
Mukhopadhyay, Subhadeep; Parzen, Emanuel
2014-01-01
We present an approach to statistical data modeling and exploratory data analysis called `LP Statistical Data Science.' It aims to generalize and unify traditional and novel statistical measures, methods, and exploratory tools. This article outlines fundamental concepts along with real-data examples to illustrate how the `LP Statistical Algorithm' can systematically tackle different varieties of data types, data patterns, and data structures under a coherent theoretical framework. A fundament...
Statistical models for trisomic phenotypes
Energy Technology Data Exchange (ETDEWEB)
Lamb, N.E.; Sherman, S.L.; Feingold, E. [Emory Univ., Atlanta, GA (United States)
1996-01-01
Certain genetic disorders are rare in the general population but more common in individuals with specific trisomies, which suggests that the genes involved in the etiology of these disorders may be located on the trisomic chromosome. As with all aneuploid syndromes, however, a considerable degree of variation exists within each phenotype so that any given trait is present only among a subset of the trisomic population. We have previously presented a simple gene-dosage model to explain this phenotypic variation and developed a strategy to map genes for such traits. The mapping strategy does not depend on the simple model but works in theory under any model that predicts that affected individuals have an increased likelihood of disomic homozygosity at the trait locus. This paper explores the robustness of our mapping method by investigating what kinds of models give an expected increase in disomic homozygosity. We describe a number of basic statistical models for trisomic phenotypes. Some of these are logical extensions of standard models for disomic phenotypes, and some are more specific to trisomy. Where possible, we discuss genetic mechanisms applicable to each model. We investigate which models and which parameter values give an expected increase in disomic homozygosity in individuals with the trait. Finally, we determine the sample sizes required to identify the increased disomic homozygosity under each model. Most of the models we explore yield detectable increases in disomic homozygosity for some reasonable range of parameter values, usually corresponding to smaller trait frequencies. It therefore appears that our mapping method should be effective for a wide variety of moderately infrequent traits, even though the exact mode of inheritance is unlikely to be known. 21 refs., 8 figs., 1 tab.
DEFF Research Database (Denmark)
Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose
2017-01-01
The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction of the fut...
A prediction model of optimum statistical unit of relief%地形起伏度最佳分析区域预测模型
Institute of Scientific and Technical Information of China (English)
张锦明; 游雄
2013-01-01
在中国区域内随机选取的78个实验区域的DEM数据作为实验对象,分别进行系列分析区域尺度的地形起伏度计算,建立了基于微观地形特征因子的地形起伏度最佳分析区域预测模型.实验结果表明:相同区域、不同尺度的DEM数据提取的地形起伏度存在差异,DEM尺度相差较小时,地形起伏度的差异也较小；地形起伏度和实验区域的平均高程、区域高程差、平均坡度和平均坡度变化率等地形特征因子存在强相关关系；当置信水平为0.05时,预测模型拟合参数的准确率达到95％以上,证明预测模型可以有效地确定最佳分析区域的取值范围.%In this paper,we uses Digital Elevation Model(DEM) with three different scales in 78 study areas to calculate multiscale relief,and tries to ascertain the evaluation model of relief and to calculate the optimum statistical unit.The experiment result shows that the relief of the same experimental area with different scale exists a certain extent of difference,the smaller difference of the scale,the smaller difference of the relief.There is a strong relationship between the relief and the terrain factors of the experimental area such as the average elevation,the elevation difference,the average slope and the average slope variability.Based on this,we build the prediction model of the relief.We use the prediction model to estimate the scale of optimum statistical unit and the accuracy of the prediction model to fitted parameter is above 95％.The experimental results show that the model can effectively determine the optimum statistical unit of relief.
Energy Technology Data Exchange (ETDEWEB)
Tran, A; Yu, V; Nguyen, D; Woods, K; Low, D; Sheng, K [UCLA, Los Angeles, CA (United States)
2015-06-15
Purpose: Knowledge learned from previous plans can be used to guide future treatment planning. Existing knowledge-based treatment planning methods study the correlation between organ geometry and dose volume histogram (DVH), which is a lossy representation of the complete dose distribution. A statistical voxel dose learning (SVDL) model was developed that includes the complete dose volume information. Its accuracy of predicting volumetric-modulated arc therapy (VMAT) and non-coplanar 4π radiotherapy was quantified. SVDL provided more isotropic dose gradients and may improve knowledge-based planning. Methods: 12 prostate SBRT patients originally treated using two full-arc VMAT techniques were re-planned with 4π using 20 intensity-modulated non-coplanar fields to a prescription dose of 40 Gy. The bladder and rectum voxels were binned based on their distances to the PTV. The dose distribution in each bin was resampled by convolving to a Gaussian kernel, resulting in 1000 data points in each bin that predicted the statistical dose information of a voxel with unknown dose in a new patient without triaging information that may be collectively important to a particular patient. We used this method to predict the DVHs, mean and max doses in a leave-one-out cross validation (LOOCV) test and compared its performance against lossy estimators including mean, median, mode, Poisson and Rayleigh of the voxelized dose distributions. Results: SVDL predicted the bladder and rectum doses more accurately than other estimators, giving mean percentile errors ranging from 13.35–19.46%, 4.81–19.47%, 22.49–28.69%, 23.35–30.5%, 21.05–53.93% for predicting mean, max dose, V20, V35, and V40 respectively, to OARs in both planning techniques. The prediction errors were generally lower for 4π than VMAT. Conclusion: By employing all dose volume information in the SVDL model, the OAR doses were more accurately predicted. 4π plans are better suited for knowledge-based planning than
DEFF Research Database (Denmark)
Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose
2017-01-01
The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction...... of the future value of a dairy cow requires further detailed knowledge of the costs associated with feed, management practices, production systems, and disease. Here, we present a method to predict the future value of the milk production of a dairy cow based on herd recording data only. The method consists...... presented in this paper is that it requires no specific knowledge of disease status or any other information beyond herd recorded milk yields, somatic cell counts, and reproductive status....
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
Directory of Open Access Journals (Sweden)
Lapierre FabianD
2010-01-01
Full Text Available Abstract For locating maritime vessels longer than 45 meters, such vessels are required to set up an Automatic Identification System (AIS used by vessel traffic services. However, when a boat is shutting down its AIS, there are no means to detect it in open sea. In this paper, we use Electro-Optical (EO imagers for noncooperative vessel detection when the AIS is not operational. As compared to radar sensors, EO sensors have lower cost, lower payload, and better computational processing load. EO sensors are mounted on LEO microsatellites. We propose a real-time statistical methodology to estimate sensor Receiver Operating Characteristic (ROC curves. It does not require the computation of the entire image received at the sensor. We then illustrate the use of this methodology to design a simple simulator that can help sensor manufacturers in optimizing the design of EO sensors for maritime applications.
Statistical modelling with quantile functions
Gilchrist, Warren
2000-01-01
Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...
Wildfire Prediction to Inform Fire Management: Statistical Science Challenges
Taylor, S W; Dean, C B; Martell, David L
2013-01-01
Wildfire is an important system process of the earth that occurs across a wide range of spatial and temporal scales. A variety of methods have been used to predict wildfire phenomena during the past century to better our understanding of fire processes and to inform fire and land management decision-making. Statistical methods have an important role in wildfire prediction due to the inherent stochastic nature of fire phenomena at all scales. Predictive models have exploited several sources of data describing fire phenomena. Experimental data are scarce; observational data are dominated by statistics compiled by government fire management agencies, primarily for administrative purposes and increasingly from remote sensing observations. Fires are rare events at many scales. The data describing fire phenomena can be zero-heavy and nonstationary over both space and time. Users of fire modeling methodologies are mainly fire management agencies often working under great time constraints, thus, complex models have t...
Sensometrics: Thurstonian and Statistical Models
DEFF Research Database (Denmark)
Christensen, Rune Haubo Bojesen
of human senses. Thurstonian models provide a stochastic model for the data-generating mechanism through a psychophysical model for the cognitive processes and in addition provides an independent measure for quantification of sensory differences. In the interest of cost-reduction and health...... of generalized linear mixed models, cumulative link models and cumulative link mixed models. The relation between the Wald, likelihood and score statistics is expanded upon using the shape of the (profile) likelihood function as common reference....
Lepilleur, Carole; Mullay, John; Kyer, Carol; McCalister, Pam; Clifford, Ted
2011-01-01
Formulation composition has a dramatic influence on coacervate formation in conditioning shampoo. The purpose of this study is to correlate the amount of coacervate formation of novel cationic cassia polymers to the corresponding conditioning profiles on European brown hair using silicone deposition, cationic polymer deposition and sensory evaluation. A design of experiments was conducted by varying the levels of three surfactants (sodium lauryl ether sulfate, sodium lauryl sulfate, and cocamidopropyl betaine) in formulations containing cationic cassia polymers of different cationic charge density (1.7 and 3.0m Eq/g). The results show formulation composition dramatically affects physical properties, coacervation, silicone deposition, cationic polymer deposition and hair sensory attributes. Particularly, three parameters are of importance in determining silicone deposition: polymer charge, surfactant (micelle) charge and total amount of surfactant (micelle aspect ratio). Both sensory panel testing and silicone deposition results can be predicted with a high confidence level using statistical models that incorporate these parameters.
McKee, Richard H; Nicolich, Mark; Roy, Timothy; White, Russell; Daughtrey, Wayne C
2014-01-01
Petroleum (commonly called crude oil) is a complex substance primarily composed of hydrocarbon constituents. Based on the results of previous toxicological studies as well as occupational experience, the principal acute toxicological hazards are those associated with exposure by inhalation to volatile hydrocarbon constituents and hydrogen sulfide, and chronic hazards are associated with inhalation exposure to benzene and dermal exposure to polycyclic aromatic compounds. The current assessment was an attempt to characterize the potential for repeated dose and/or developmental effects of crude oils following dermal exposures and to generalize the conclusions across a broad range of crude oils from different sources. Statistical models were used to predict the potential for repeated dose and developmental toxicity from compositional information. The model predictions indicated that the empirical data from previously tested crude oils approximated a "worst case" situation, and that the data from previously tested crude oils could be used as a reasonable basis for characterizing the repeated dose and developmental toxicological hazards of crude oils in general.
Comparison of prediction performance using statistical postprocessing methods
Han, Keunhee; Choi, JunTae; Kim, Chansoo
2016-11-01
As the 2018 Winter Olympics are to be held in Pyeongchang, both general weather information on Pyeongchang and specific weather information on this region, which can affect game operation and athletic performance, are required. An ensemble prediction system has been applied to provide more accurate weather information, but it has bias and dispersion due to the limitations and uncertainty of its model. In this study, homogeneous and nonhomogeneous regression models as well as Bayesian model averaging (BMA) were used to reduce the bias and dispersion existing in ensemble prediction and to provide probabilistic forecast. Prior to applying the prediction methods, reliability of the ensemble forecasts was tested by using a rank histogram and a residualquantile-quantile plot to identify the ensemble forecasts and the corresponding verifications. The ensemble forecasts had a consistent positive bias, indicating over-forecasting, and were under-dispersed. To correct such biases, statistical post-processing methods were applied using fixed and sliding windows. The prediction skills of methods were compared by using the mean absolute error, root mean square error, continuous ranked probability score, and continuous ranked probability skill score. Under the fixed window, BMA exhibited better prediction skill than the other methods in most observation station. Under the sliding window, on the other hand, homogeneous and non-homogeneous regression models with positive regression coefficients exhibited better prediction skill than BMA. In particular, the homogeneous regression model with positive regression coefficients exhibited the best prediction skill.
A Statistical Programme Assignment Model
DEFF Research Database (Denmark)
Rosholm, Michael; Staghøj, Jonas; Svarer, Michael
assignment mechanism, which is based on the discretionary choice of case workers. This is done in a duration model context, using the timing-of-events framework to identify causal effects. We compare different assignment mechanisms, and the results suggest that a significant reduction in the average...... duration of unemployment spells may result if a statistical programme assignment model is introduced. We discuss several issues regarding the plementation of such a system, especially the interplay between the statistical model and case workers....
A Statistical Programme Assignment Model
DEFF Research Database (Denmark)
Rosholm, Michael; Staghøj, Jonas; Svarer, Michael
When treatment effects of active labour market programmes are heterogeneous in an observable way across the population, the allocation of the unemployed into different programmes becomes a particularly important issue. In this paper, we present a statistical model designed to improve the present...... assignment mechanism, which is based on the discretionary choice of case workers. This is done in a duration model context, using the timing-of-events framework to identify causal effects. We compare different assignment mechanisms, and the results suggest that a significant reduction in the average...... duration of unemployment spells may result if a statistical programme assignment model is introduced. We discuss several issues regarding the plementation of such a system, especially the interplay between the statistical model and case workers....
Schläppy, Romain; Eckert, Nicolas; Jomelli, Vincent; Grancher, Delphine; Brunstein, Daniel; Stoffel, Markus; Naaim, Mohamed
2013-04-01
Documenting past avalanche activity represents an indispensable step in avalanche hazard assessment. Nevertheless, (i) archival records of past avalanche events do not normally yield data with satisfying spatial and temporal resolution and (ii) precision concerning runout distance is generally poorly defined. In addition, historic documentation is most often (iii) biased toward events that caused damage to structure or loss of life on the one hand and (iv) undersampled in unpopulated areas on the other hand. On forested paths dendrogeomorphology has been demonstrated to represent a powerful tool to reconstruct past activity of avalanches with annual resolution and for periods covering the past decades to centuries. This method is based on the fact that living trees may be affected by snow avalanches during their flow and deposition phases. Affected trees will react upon these disturbances with a certain growth response. An analysis of the responses recorded in tree rings coupled with an evaluation of the position of reacting trees within the path allows the dendrogeomorphic expert to identify past snow avalanche events and deduced their minimum runout distance. The objective of the work presented here is firstly to dendrochronogically -reconstruct snow avalanche activity in the Château Jouan path located near Montgenèvre in the French Alps. Minimal runout distances are then determined for each reconstructed event by considering the point of further reach along the topographic profile. Related empirical return intervals are evaluated, combining the extent of each event with the average local frequency of the dendrological record. In a second step, the runout distance distribution derived from dendrochronological reconstruction is compared to the one derived from historical archives and to high return period avalanches predicted by an up-to-date locally calibrated statistical-numerical model. It appears that dendrochronological reconstructions correspond mostly to
Image quantization: statistics and modeling
Whiting, Bruce R.; Muka, Edward
1998-07-01
A method for analyzing the effects of quantization, developed for temporal one-dimensional signals, is extended to two- dimensional radiographic images. By calculating the probability density function for the second order statistics (the differences between nearest neighbor pixels) and utilizing its Fourier transform (the characteristic function), the effect of quantization on image statistics can be studied by the use of standard communication theory. The approach is demonstrated by characterizing the noise properties of a storage phosphor computed radiography system and the image statistics of a simple radiographic object (cylinder) and by comparing the model to experimental measurements. The role of quantization noise and the onset of contouring in image degradation are explained.
Cestari, Andrea
2013-01-01
Predictive modeling is emerging as an important knowledge-based technology in healthcare. The interest in the use of predictive modeling reflects advances on different fronts such as the availability of health information from increasingly complex databases and electronic health records, a better understanding of causal or statistical predictors of health, disease processes and multifactorial models of ill-health and developments in nonlinear computer models using artificial intelligence or neural networks. These new computer-based forms of modeling are increasingly able to establish technical credibility in clinical contexts. The current state of knowledge is still quite young in understanding the likely future direction of how this so-called 'machine intelligence' will evolve and therefore how current relatively sophisticated predictive models will evolve in response to improvements in technology, which is advancing along a wide front. Predictive models in urology are gaining progressive popularity not only for academic and scientific purposes but also into the clinical practice with the introduction of several nomograms dealing with the main fields of onco-urology.
Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P
2015-05-01
The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events.
Apel, Heiko; Baimaganbetov, Azamat; Kalashnikova, Olga; Gavrilenko, Nadejda; Abdykerimova, Zharkinay; Agalhanova, Marina; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Gafurov, Abror
2017-04-01
The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt dominated river discharge originating in the mountains provides the main water resource available for agricultural production, but also for storage in reservoirs for energy generation during the winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources. In fact, seasonal forecasts are mandatory tasks of all national hydro-meteorological services in the region. In order to support the operational seasonal forecast procedures of hydromet services, this study aims at the development of a generic tool for deriving statistical forecast models of seasonal river discharge. The generic model is kept as simple as possible in order to be driven by available hydrological and meteorological data, and be applicable for all catchments with their often limited data availability in the region. As snowmelt dominates summer runoff, the main meteorological predictors for the forecast models are monthly values of winter precipitation and temperature as recorded by climatological stations in the catchments. These data sets are accompanied by snow cover predictors derived from the operational ModSnow tool, which provides cloud free snow cover data for the selected catchments based on MODIS satellite images. In addition to the meteorological data antecedent streamflow is used as a predictor variable. This basic predictor set was further extended by multi-monthly means of the individual predictors, as well as composites of the predictors. Forecast models are derived based on these predictors as linear combinations of up to 3 or 4 predictors. A user selectable number of best models according to pre-defined performance criteria is extracted automatically by the developed model fitting algorithm, which includes a test
Manning, Robert M.
1987-01-01
A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.
Simple statistical model for branched aggregates
DEFF Research Database (Denmark)
Lemarchand, Claire; Hansen, Jesper Schmidt
2015-01-01
, given that it already has bonds with others. The model is applied here to asphaltene nanoaggregates observed in molecular dynamics simulations of Cooee bitumen. The variation with temperature of the probabilities deduced from this model is discussed in terms of statistical mechanics arguments......We propose a statistical model that can reproduce the size distribution of any branched aggregate, including amylopectin, dendrimers, molecular clusters of monoalcohols, and asphaltene nanoaggregates. It is based on the conditional probability for one molecule to form a new bond with a molecule....... The relevance of the statistical model in the case of asphaltene nanoaggregates is checked by comparing the predicted value of the probability for one molecule to have exactly i bonds with the same probability directly measured in the molecular dynamics simulations. The agreement is satisfactory...
Textual information access statistical models
Gaussier, Eric
2013-01-01
This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:- information extraction and retrieval;- text classification and clustering;- opinion mining;- comprehension aids (automatic summarization, machine translation, visualization).In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications
Wind speed prediction using statistical regression and neural network
Indian Academy of Sciences (India)
Makarand A Kulkarni; Sunil Patil; G V Rama; P N Sen
2008-08-01
Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve ﬁtting,Auto Regressive Integrated Moving Average Model (ARIMA),extrapolation with periodic function and Artiﬁcial Neural Networks (ANN)are employed to predict wind speed.These methods require wind speeds of previous hours as input.It has been found that wind speed can be predicted with a reasonable degree of accuracy using two methods,viz.,extrapolation using periodic curve ﬁtting and ANN and the other two methods are not very useful.
Jet Noise Diagnostics Supporting Statistical Noise Prediction Methods
Bridges, James E.
2006-01-01
The primary focus of my presentation is the development of the jet noise prediction code JeNo with most examples coming from the experimental work that drove the theoretical development and validation. JeNo is a statistical jet noise prediction code, based upon the Lilley acoustic analogy. Our approach uses time-average 2-D or 3-D mean and turbulent statistics of the flow as input. The output is source distributions and spectral directivity. NASA has been investing in development of statistical jet noise prediction tools because these seem to fit the middle ground that allows enough flexibility and fidelity for jet noise source diagnostics while having reasonable computational requirements. These tools rely on Reynolds-averaged Navier-Stokes (RANS) computational fluid dynamics (CFD) solutions as input for computing far-field spectral directivity using an acoustic analogy. There are many ways acoustic analogies can be created, each with a series of assumptions and models, many often taken unknowingly. And the resulting prediction can be easily reverse-engineered by altering the models contained within. However, only an approach which is mathematically sound, with assumptions validated and modeled quantities checked against direct measurement will give consistently correct answers. Many quantities are modeled in acoustic analogies precisely because they have been impossible to measure or calculate, making this requirement a difficult task. The NASA team has spent considerable effort identifying all the assumptions and models used to take the Navier-Stokes equations to the point of a statistical calculation via an acoustic analogy very similar to that proposed by Lilley. Assumptions have been identified and experiments have been developed to test these assumptions. In some cases this has resulted in assumptions being changed. Beginning with the CFD used as input to the acoustic analogy, models for turbulence closure used in RANS CFD codes have been explored and
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
In the first paper in this series, a variational data assimilation of ideal tropical cyclone (TC) tracks was performed for the statistical-dynamical prediction model SD-90 by the adjoint method, and a prediction of TC tracks was made with good accuracy for tracks containing no sharp turns. In the present paper, the cases of real TC tracks are studied. Due to the complexity of TC motion, attention is paid to the diagnostic research of TC motion. First, five TC tracks are studied. Using the data of each entire TC track, by the adjoint method, five TC tracks are fitted well, and the forces acting on the TCs are retrieved. For a given TC, the distribution of the resultant of the retrieved force and Coriolis force well matches the corresponding TC track, i.e., when a TC turns, the resultant of the retrieved force and Coriolis force acts as a centripetal force, which means that the TC indeed moves like a particle; in particular, for TC 9911, the clockwise looping motion is also fitted well. And the distribution of the resultant appears to be periodic in some cases. Then, the present method is carried out for a portion of the track data for TC 9804, which indicates that when the amount of data for a TC track is sufficient, the algorithm is stable. And finally, the same algorithm is implemented for TCs with a double-eyewall structure, namely Bilis (2000) and Winnie (1997),and the results prove the applicability of the algorithm to TCs with complicated mesoscale structures if the TC track data are obtained every three hours.
Prediction models in complex terrain
DEFF Research Database (Denmark)
Marti, I.; Nielsen, Torben Skov; Madsen, Henrik
2001-01-01
are calculated using on-line measurements of power production as well as HIRLAM predictions as input thus taking advantage of the auto-correlation, which is present in the power production for shorter pediction horizons. Statistical models are used to discribe the relationship between observed energy production......The objective of the work is to investigatethe performance of HIRLAM in complex terrain when used as input to energy production forecasting models, and to develop a statistical model to adapt HIRLAM prediction to the wind farm. The features of the terrain, specially the topography, influence...... and HIRLAM predictions. The statistical models belong to the class of conditional parametric models. The models are estimated using local polynomial regression, but the estimation method is here extended to be adaptive in order to allow for slow changes in the system e.g. caused by the annual variations...
Improved model for statistical alignment
Energy Technology Data Exchange (ETDEWEB)
Miklos, I.; Toroczkai, Z. (Zoltan)
2001-01-01
The statistical approach to molecular sequence evolution involves the stochastic modeling of the substitution, insertion and deletion processes. Substitution has been modeled in a reliable way for more than three decades by using finite Markov-processes. Insertion and deletion, however, seem to be more difficult to model, and thc recent approaches cannot acceptably deal with multiple insertions and deletions. A new method based on a generating function approach is introduced to describe the multiple insertion process. The presented algorithm computes the approximate joint probability of two sequences in 0(13) running time where 1 is the geometric mean of the sequence lengths.
[Statistical prediction of radioactive contamination impacts on agricultural pasture lands].
Spiridonov, S I; Ivanov, V V
2014-01-01
Based on the literature data analysis, the rationale is given for the use of probabilistic approaches to solve the problems of estimation of a long-lived radionuclide uptake in animal products. Methods for statistical prediction of radioactive contamination consequences for agricultural pasture lands have been devised and implemented in the form of models and program modules. These offer the estimation of radionuclide transfer between the links of an agricultural chain, taking into account variability in the migration parameters, estimation of soil contamination limits based on the preset risk levels for the stuffs produced and statistical coordination of standards. An illustration is given of the application of the above methods using statistical characteristics of 137Cs migration parameters in the soil-plant-animal produce chain. Further trends have been formulated in the development of the risk concept as applied to the assessment of radioecological situations of radioactive contamination of the agricultural land.
Statistical corrections to numerical predictions. IV. [of weather
Schemm, Jae-Kyung; Faller, Alan J.
1986-01-01
The National Meteorological Center Barotropic-Mesh Model has been used to test a statistical correction procedure, designated as M-II, that was developed in Schemm et al. (1981). In the present application, statistical corrections at 12 h resulted in significant reductions of the mean-square errors of both vorticity and the Laplacian of thickness. Predictions to 48 h demonstrated the feasibility of applying corrections at every 12 h in extended forecasts. In addition to these improvements, however, the statistical corrections resulted in a shift of error from smaller to larger-scale motions, improving the smallest scales dramatically but deteriorating the largest scales. This effect is shown to be a consequence of randomization of the residual errors by the regression equations and can be corrected by spatially high-pass filtering the field of corrections before they are applied.
Predicting statistical properties of open reading frames in bacterial genomes.
Directory of Open Access Journals (Sweden)
Katharina Mir
Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
Statistical modeling of geopressured geothermal reservoirs
Ansari, Esmail; Hughes, Richard; White, Christopher D.
2017-06-01
Identifying attractive candidate reservoirs for producing geothermal energy requires predictive models. In this work, inspectional analysis and statistical modeling are used to create simple predictive models for a line drive design. Inspectional analysis on the partial differential equations governing this design yields a minimum number of fifteen dimensionless groups required to describe the physics of the system. These dimensionless groups are explained and confirmed using models with similar dimensionless groups but different dimensional parameters. This study models dimensionless production temperature and thermal recovery factor as the responses of a numerical model. These responses are obtained by a Box-Behnken experimental design. An uncertainty plot is used to segment the dimensionless time and develop a model for each segment. The important dimensionless numbers for each segment of the dimensionless time are identified using the Boosting method. These selected numbers are used in the regression models. The developed models are reduced to have a minimum number of predictors and interactions. The reduced final models are then presented and assessed using testing runs. Finally, applications of these models are offered. The presented workflow is generic and can be used to translate the output of a numerical simulator into simple predictive models in other research areas involving numerical simulation.
Statistical bootstrap model and annihilations
Möhring, H J
1974-01-01
The statistical bootstrap model (SBM) describes the decay of single, high mass, hadronic states (fireballs, clusters) into stable particles. Coupling constants B, one for each isospin multiplet of stable particles, are the only free parameter of the model. They are related to the maximum temperature parameter T/sub 0/. The various versions of the SMB can be classified into two groups: full statistical bootstrap models and linear ones. The main results of the model are the following: i) All momentum spectra are isotropic; especially the exclusive ones are described by invariant phase space. The inclusive and semi-inclusive single-particle distributions are asymptotically of pure exponential shape; the slope is governed by T /sub 0/ only. ii) The model parameter B for pions has been obtained by fitting the multiplicity distribution in pp and pn at rest, and corresponds to T/sub 0/=0.167 GeV in the full SBM with exotics. The average pi /sup -/ multiplicity for the linear and the full SBM (both with exotics) is c...
Predicting recreational water quality advisories: A comparison of statistical methods
Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.
2016-01-01
Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.
Spatial statistics for predicting flow through a rock fracture
Energy Technology Data Exchange (ETDEWEB)
Coakley, K.J.
1989-03-01
Fluid flow through a single rock fracture depends on the shape of the space between the upper and lower pieces of rock which define the fracture. In this thesis, the normalized flow through a fracture, i.e. the equivalent permeability of a fracture, is predicted in terms of spatial statistics computed from the arrangement of voids, i.e. open spaces, and contact areas within the fracture. Patterns of voids and contact areas, with complexity typical of experimental data, are simulated by clipping a correlated Gaussian process defined on a N by N pixel square region. The voids have constant aperture; the distance between the upper and lower surfaces which define the fracture is either zero or a constant. Local flow is assumed to be proportional to local aperture cubed times local pressure gradient. The flow through a pattern of voids and contact areas is solved using a finite-difference method. After solving for the flow through simulated 10 by 10 by 30 pixel patterns of voids and contact areas, a model to predict equivalent permeability is developed. The first model is for patterns with 80% voids where all voids have the same aperture. The equivalent permeability of a pattern is predicted in terms of spatial statistics computed from the arrangement of voids and contact areas within the pattern. Four spatial statistics are examined. The change point statistic measures how often adjacent pixel alternate from void to contact area (or vice versa ) in the rows of the patterns which are parallel to the overall flow direction. 37 refs., 66 figs., 41 tabs.
Statistical shape and appearance models of bones.
Sarkalkan, Nazli; Weinans, Harrie; Zadpoor, Amir A
2014-03-01
When applied to bones, statistical shape models (SSM) and statistical appearance models (SAM) respectively describe the mean shape and mean density distribution of bones within a certain population as well as the main modes of variations of shape and density distribution from their mean values. The availability of this quantitative information regarding the detailed anatomy of bones provides new opportunities for diagnosis, evaluation, and treatment of skeletal diseases. The potential of SSM and SAM has been recently recognized within the bone research community. For example, these models have been applied for studying the effects of bone shape on the etiology of osteoarthritis, improving the accuracy of clinical osteoporotic fracture prediction techniques, design of orthopedic implants, and surgery planning. This paper reviews the main concepts, methods, and applications of SSM and SAM as applied to bone.
Prediction models in complex terrain
DEFF Research Database (Denmark)
Marti, I.; Nielsen, Torben Skov; Madsen, Henrik
2001-01-01
The objective of the work is to investigatethe performance of HIRLAM in complex terrain when used as input to energy production forecasting models, and to develop a statistical model to adapt HIRLAM prediction to the wind farm. The features of the terrain, specially the topography, influence...
Pitfalls in statistical landslide susceptibility modelling
Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut
2010-05-01
The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible
Grudinin, Sergei; Kadukova, Maria; Eisenbarth, Andreas; Marillet, Simon; Cazals, Frédéric
2016-09-01
The 2015 D3R Grand Challenge provided an opportunity to test our new model for the binding free energy of small molecules, as well as to assess our protocol to predict binding poses for protein-ligand complexes. Our pose predictions were ranked 3-9 for the HSP90 dataset, depending on the assessment metric. For the MAP4K dataset the ranks are very dispersed and equal to 2-35, depending on the assessment metric, which does not provide any insight into the accuracy of the method. The main success of our pose prediction protocol was the re-scoring stage using the recently developed Convex-PL potential. We make a thorough analysis of our docking predictions made with AutoDock Vina and discuss the effect of the choice of rigid receptor templates, the number of flexible residues in the binding pocket, the binding pocket size, and the benefits of re-scoring. However, the main challenge was to predict experimentally determined binding affinities for two blind test sets. Our affinity prediction model consisted of two terms, a pairwise-additive enthalpy, and a non pairwise-additive entropy. We trained the free parameters of the model with a regularized regression using affinity and structural data from the PDBBind database. Our model performed very well on the training set, however, failed on the two test sets. We explain the drawback and pitfalls of our model, in particular in terms of relative coverage of the test set by the training set and missed dynamical properties from crystal structures, and discuss different routes to improve it.
Bayesian Model Selection and Statistical Modeling
Ando, Tomohiro
2010-01-01
Bayesian model selection is a fundamental part of the Bayesian statistical modeling process. The quality of these solutions usually depends on the goodness of the constructed Bayesian model. Realizing how crucial this issue is, many researchers and practitioners have been extensively investigating the Bayesian model selection problem. This book provides comprehensive explanations of the concepts and derivations of the Bayesian approach for model selection and related criteria, including the Bayes factor, the Bayesian information criterion (BIC), the generalized BIC, and the pseudo marginal lik
Graphics and statistics for cardiology: clinical prediction rules.
Woodward, Mark; Tunstall-Pedoe, Hugh; Peters, Sanne Ae
2017-04-01
Graphs and tables are indispensable aids to quantitative research. When developing a clinical prediction rule that is based on a cardiovascular risk score, there are many visual displays that can assist in developing the underlying statistical model, testing the assumptions made in this model, evaluating and presenting the resultant score. All too often, researchers in this field follow formulaic recipes without exploring the issues of model selection and data presentation in a meaningful and thoughtful way. Some ideas on how to use visual displays to make wise decisions and present results that will both inform and attract the reader are given. Ideas are developed, and results tested, using subsets of the data that were used to develop the ASSIGN cardiovascular risk score, as used in Scotland.
Predicting weak lensing statistics from halo mass reconstructions - Final Paper
Energy Technology Data Exchange (ETDEWEB)
Everett, Spencer [SLAC National Accelerator Lab., Menlo Park, CA (United States)
2015-08-20
As dark matter does not absorb or emit light, its distribution in the universe must be inferred through indirect effects such as the gravitational lensing of distant galaxies. While most sources are only weakly lensed, the systematic alignment of background galaxies around a foreground lens can constrain the mass of the lens which is largely in the form of dark matter. In this paper, I have implemented a framework to reconstruct all of the mass along lines of sight using a best-case dark matter halo model in which the halo mass is known. This framework is then used to make predictions of the weak lensing of 3,240 generated source galaxies through a 324 arcmin² field of the Millennium Simulation. The lensed source ellipticities are characterized by the ellipticity-ellipticity and galaxy-mass correlation functions and compared to the same statistic for the intrinsic and ray-traced ellipticities. In the ellipticity-ellipticity correlation function, I and that the framework systematically under predicts the shear power by an average factor of 2.2 and fails to capture correlation from dark matter structure at scales larger than 1 arcminute. The model predicted galaxy-mass correlation function is in agreement with the ray-traced statistic from scales 0.2 to 0.7 arcminutes, but systematically underpredicts shear power at scales larger than 0.7 arcminutes by an average factor of 1.2. Optimization of the framework code has reduced the mean CPU time per lensing prediction by 70% to 24 ± 5 ms. Physical and computational shortcomings of the framework are discussed, as well as potential improvements for upcoming work.
Precipitation Prediction in North Africa Based on Statistical Downscaling
Molina, J. M.; Zaitchik, B.
2013-12-01
Although Global Climate Models (GCM) outputs should not be used directly to predict precipitation variability and change at the local scale, GCM projections of large-scale features in ocean and atmosphere can be applied to infer future statistical properties of climate at finer resolutions through empirical statistical downscaling techniques. A number of such downscaling methods have been proposed in the literature, and although all of them have advantages and limitations depending on the specific downscaling problem, most of them have been developed and tested in developed countries. In this research, we explore the use of statistical downscaling to generate future local precipitation scenarios in different locations in Northern Africa, where available data is sparse and missing values are frequently observed in the historical records. The presence of arid and semiarid regions in North African countries and the persistence of long periods with no rain pose challenges to the downscaling exercise since normality assumptions may be a serious limitation in the application of traditional linear regression methods. In our work, the development of monthly statistical relationships between the local precipitation and the large-scale predictors considers common Empirical Orthogonal Functions (EOFs) from different NCAR/Reanalysis climate fields (e.g., Sea Level Pressure (SLP) and Global Precipitation). GCM/CMIP5 data is considered in the predictor data set to analyze the future local precipitation. Both parametric (e.g., Generalized Linear Models (GLM)) and nonparametric (e,g,, Bootstrapping) approaches are considered in the regression analysis, and different spatial windows in the predictor fields are tested in the prediction experiments. In the latter, seasonal spatial cross-covariance between predictant and predictors is estimated by means of a teleconnections algorithm which was implemented to define the regions in the predictor domain that better captures the
Ning, D.; Zhang, M.; Ren, S.; Hou, Y.; Yu, L.; Meng, Z.
2017-01-01
Forest plays an important role in hydrological cycle, and forest changes will inevitably affect runoff across multiple spatial scales. The selection of a suitable indicator for forest changes is essential for predicting forest-related hydrological response. This study used the Meijiang River, one of the headwaters of the Poyang Lake as an example to identify the best indicator of forest changes for predicting forest change-induced hydrological responses. Correlation analysis was conducted first to detect the relationships between monthly runoff and its predictive variables including antecedent monthly precipitation and indicators for forest changes (forest coverage, vegetation indices including EVI, NDVI, and NDWI), and by use of the identified predictive variables that were most correlated with monthly runoff, multiple linear regression models were then developed. The model with best performance identified in this study included two independent variables -antecedent monthly precipitation and NDWI. It indicates that NDWI is the best indicator of forest change in hydrological prediction while forest coverage, the most commonly used indicator of forest change is insignificantly related to monthly runoff. This highlights the use of vegetation index such as NDWI to indicate forest changes in hydrological studies. This study will provide us with an efficient way to quantify the hydrological impact of large-scale forest changes in the Meijiang River watershed, which is crucial for downstream water resource management and ecological protection in the Poyang Lake basin.
Statistical Analysis by Statistical Physics Model for the STOCK Markets
Wang, Tiansong; Wang, Jun; Fan, Bingli
A new stochastic stock price model of stock markets based on the contact process of the statistical physics systems is presented in this paper, where the contact model is a continuous time Markov process, one interpretation of this model is as a model for the spread of an infection. Through this model, the statistical properties of Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) are studied. In the present paper, the data of SSE Composite Index and the data of SZSE Component Index are analyzed, and the corresponding simulation is made by the computer computation. Further, we investigate the statistical properties, fat-tail phenomena, the power-law distributions, and the long memory of returns for these indices. The techniques of skewness-kurtosis test, Kolmogorov-Smirnov test, and R/S analysis are applied to study the fluctuation characters of the stock price returns.
Visualizing statistical models and concepts
Farebrother, RW
2002-01-01
Examines classic algorithms, geometric diagrams, and mechanical principles for enhancing visualization of statistical estimation procedures and mathematical concepts in physics, engineering, and computer programming.
Statistical Language Model for Chinese Text Proofreading
Institute of Scientific and Technical Information of China (English)
张仰森; 曹元大
2003-01-01
Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words wi and wj in linguistic environment(LE). First, the word association degree between wi and wj is defined by using the distance-weighted factor, wj is l words apart from wi in the LE, then Bayes formula is used to calculate the LE related degree of word wi, and lastly, the LE related degree is taken as criterion to predict the reasonability of word wi that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
Lee, Gwang-Se; Cheong, Cheolung
2014-12-01
Despite increasing concern about low-frequency noise of modern large horizontal-axis wind turbines (HAWTs), few studies have focused on its origin or its prediction methods. In this paper, infra- and low-frequency (the ILF) wind turbine noise are closely examined and an efficient method is developed for its prediction. Although most previous studies have assumed that the ILF noise consists primarily of blade passing frequency (BPF) noise components, these tonal noise components are seldom identified in the measured noise spectrum, except for the case of downwind wind turbines. In reality, since modern HAWTs are very large, during rotation, a single blade of the turbine experiences inflow with variation in wind speed in time as well as in space, breaking periodic perturbations of the BPF. Consequently, this transforms acoustic contributions at the BPF harmonics into broadband noise components. In this study, the ILF noise of wind turbines is predicted by combining Lowson's acoustic analogy with the stochastic wind model, which is employed to reproduce realistic wind speed conditions. In order to predict the effects of these wind conditions on pressure variation on the blade surface, unsteadiness in the incident wind speed is incorporated into the XFOIL code by varying incident flow velocities on each blade section, which depend on the azimuthal locations of the rotating blade. The calculated surface pressure distribution is subsequently used to predict acoustic pressure at an observing location by using Lowson's analogy. These predictions are compared with measured data, which ensures that the present method can reproduce the broadband characteristics of the measured low-frequency noise spectrum. Further investigations are carried out to characterize the IFL noise in terms of pressure loading on blade surface, narrow-band noise spectrum and noise maps around the turbine.
Directory of Open Access Journals (Sweden)
Gwang-Se Lee
2014-12-01
Full Text Available Despite increasing concern about low-frequency noise of modern large horizontal-axis wind turbines (HAWTs, few studies have focused on its origin or its prediction methods. In this paper, infra- and low-frequency (the ILF wind turbine noise are closely examined and an efficient method is developed for its prediction. Although most previous studies have assumed that the ILF noise consists primarily of blade passing frequency (BPF noise components, these tonal noise components are seldom identified in the measured noise spectrum, except for the case of downwind wind turbines. In reality, since modern HAWTs are very large, during rotation, a single blade of the turbine experiences inflow with variation in wind speed in time as well as in space, breaking periodic perturbations of the BPF. Consequently, this transforms acoustic contributions at the BPF harmonics into broadband noise components. In this study, the ILF noise of wind turbines is predicted by combining Lowson’s acoustic analogy with the stochastic wind model, which is employed to reproduce realistic wind speed conditions. In order to predict the effects of these wind conditions on pressure variation on the blade surface, unsteadiness in the incident wind speed is incorporated into the XFOIL code by varying incident flow velocities on each blade section, which depend on the azimuthal locations of the rotating blade. The calculated surface pressure distribution is subsequently used to predict acoustic pressure at an observing location by using Lowson’s analogy. These predictions are compared with measured data, which ensures that the present method can reproduce the broadband characteristics of the measured low-frequency noise spectrum. Further investigations are carried out to characterize the IFL noise in terms of pressure loading on blade surface, narrow-band noise spectrum and noise maps around the turbine.
Statistical modeling of a considering work-piece
Directory of Open Access Journals (Sweden)
Cornelia Victoria Anghel
2008-10-01
Full Text Available In this article are presented the stochastic predictive models for controlling properly the independent variables of the drilling operation a combined approach of statistical design and Response Surface Methodology (RSM.
Comparison of Statistical Models for Regional Crop Trial Analysis
Institute of Scientific and Technical Information of China (English)
ZHANG Qun-yuan; KONG Fan-ling
2002-01-01
Based on the review and comparison of main statistical analysis models for estimating varietyenvironment cell means in regional crop trials, a new statistical model, LR-PCA composite model was proposed, and the predictive precision of these models were compared by cross validation of an example data. Results showed that the order of model precision was LR-PCA model ＞ AMMI model ＞ PCA model ＞ Treatment Means (TM) model ＞ Linear Regression (LR) model ＞ Additive Main Effects ANOVA model. The precision gain factor of LR-PCA model was 1.55, increasing by 8.4% compared with AMMI.
Statistical tests of simple earthquake cycle models
DeVries, Phoebe M. R.; Evans, Eileen L.
2016-12-01
A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.
ARSENIC CONTAMINATION IN GROUNDWATER: A STATISTICAL MODELING
Directory of Open Access Journals (Sweden)
Palas Roy
2013-01-01
Full Text Available High arsenic in natural groundwater in most of the tubewells of the Purbasthali- Block II area of Burdwan district (W.B, India has recently been focused as a serious environmental concern. This paper is intending to illustrate the statistical modeling of the arsenic contaminated groundwater to identify the interrelation of that arsenic contain with other participating groundwater parameters so that the arsenic contamination level can easily be predicted by analyzing only such parameters. Multivariate data analysis was done with the collected groundwater samples from the 132 tubewells of this contaminated region shows that three variable parameters are significantly related with the arsenic. Based on these relationships, a multiple linear regression model has been developed that estimated the arsenic contamination by measuring such three predictor parameters of the groundwater variables in the contaminated aquifer. This model could also be a suggestive tool while designing the arsenic removal scheme for any affected groundwater.
Institute of Scientific and Technical Information of China (English)
张强; 李少远
2006-01-01
A statistic-based benchmark was proposed for performance assessment and monitoring of model predictive control; the benchmark was straightforward and achievable by recording a set of output data only when the control performance was good according to the user's selection. Principal component model was built and an autoregressive moving average filter was identified to monitor the performance; an improved T2 statistic was selected as the performance monitor index. When performance changes were detected, diagnosis was done by model validation using recursive analysis and generalized likelihood ratio (GLR) method. This distinguished the fact that the performance change was due to plant model mismatch or due to disturbance term. Simulation was done about a heavy oil fractionator system and good results were obtained. The diagnosis result was helpful for the operator to improve the system performance.
Statistical predictability in the atmosphere and other dynamical systems
Kleeman, Richard
2007-06-01
Ensemble predictions are an integral part of routine weather and climate prediction because of the sensitivity of such projections to the specification of the initial state. In many discussions it is tacitly assumed that ensembles are equivalent to probability distribution functions (p.d.f.s) of the random variables of interest. In general for vector valued random variables this is not the case (not even approximately) since practical ensembles do not adequately sample the high dimensional state spaces of dynamical systems of practical relevance. In this contribution we place these ideas on a rigorous footing using concepts derived from Bayesian analysis and information theory. In particular we show that ensembles must imply a coarse graining of state space and that this coarse graining implies loss of information relative to the converged p.d.f. To cope with the needed coarse graining in the context of practical applications, we introduce a hierarchy of entropic functionals. These measure the information content of multivariate marginal distributions of increasing order. For fully converged distributions (i.e. p.d.f.s) these functionals form a strictly ordered hierarchy. As one proceeds up the hierarchy with ensembles instead however, increasingly coarser partitions are required by the functionals which implies that the strict ordering of the p.d.f. based functionals breaks down. This breakdown is symptomatic of the necessarily limited sampling by practical ensembles of high dimensional state spaces and is unavoidable for most practical applications. In the second part of the paper the theoretical machinery developed above is applied to the practical problem of mid-latitude weather prediction. We show that the functionals derived in the first part all decline essentially linearly with time and there appears in fact to be a fairly well defined cut off time (roughly 45 days for the model analyzed) beyond which initial condition information is unimportant to
Statistical model semiquantitatively approximates arabinoxylooligosaccharides' structural diversity
DEFF Research Database (Denmark)
Dotsenko, Gleb; Nielsen, Michael Krogsgaard; Lange, Lene
2016-01-01
A statistical model describing the random distribution of substituted xylopyranosyl residues in arabinoxylooligosaccharides is suggested and compared with existing experimental data. Structural diversity of arabinoxylooligosaccharides of various length, originating from different arabinoxylans...... (wheat flour arabinoxylan (arabinose/xylose, A/X = 0.47); grass arabinoxylan (A/X = 0.24); wheat straw arabinoxylan (A/X = 0.15); and hydrothermally pretreated wheat straw arabinoxylan (A/X = 0.05)), is semiquantitatively approximated using the proposed model. The suggested approach can be applied...... not only for prediction and quantification of arabinoxylooligosaccharides' structural diversity, but also for estimate of yield and selection of the optimal source of arabinoxylan for production of arabinoxylooligosaccharides with desired structural features....
Fermi breakup and the statistical multifragmentation model
Energy Technology Data Exchange (ETDEWEB)
Carlson, B.V., E-mail: brett@ita.br [Departamento de Fisica, Instituto Tecnologico de Aeronautica - CTA, 12228-900 Sao Jose dos Campos (Brazil); Donangelo, R. [Instituto de Fisica, Universidade Federal do Rio de Janeiro, Cidade Universitaria, CP 68528, 21941-972, Rio de Janeiro (Brazil); Instituto de Fisica, Facultad de Ingenieria, Universidad de la Republica, Julio Herrera y Reissig 565, 11.300 Montevideo (Uruguay); Souza, S.R. [Instituto de Fisica, Universidade Federal do Rio de Janeiro, Cidade Universitaria, CP 68528, 21941-972, Rio de Janeiro (Brazil); Instituto de Fisica, Universidade Federal do Rio Grande do Sul, Av. Bento Goncalves 9500, CP 15051, 91501-970, Porto Alegre (Brazil); Lynch, W.G.; Steiner, A.W.; Tsang, M.B. [Joint Institute for Nuclear Astrophysics, National Superconducting Cyclotron Laboratory and the Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824 (United States)
2012-02-15
We demonstrate the equivalence of a generalized Fermi breakup model, in which densities of excited states are taken into account, to the microcanonical statistical multifragmentation model used to describe the disintegration of highly excited fragments of nuclear reactions. We argue that such a model better fulfills the hypothesis of statistical equilibrium than the Fermi breakup model generally used to describe statistical disintegration of light mass nuclei.
Statistical Prediction of Heavy Rain in South Korea
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
This study is aimed at the development of a statistical model for forecasting heavy rain in South Korea. For the 3-hour weather forecast system, the 10 km× 10 km area-mean amount of rainfall at 6 stations (Seoul, Daejeon, Gangreung, Gwangju, Busan, and Jeju) in South Korea are used. And the corresponding 45 synoptic factors generated by the numerical model are used as potential predictors. Four statistical forecast models (linear regression model, logistic regression model, neural network model and decision tree model) for the occurrence of heavy rain are based on the model output statistics (MOS) method. They are separately estimated by the same training data. The thresholds are considered to forecast the occurrence of heavy rain because the distribution of estimated values that are generated by each model is too skewed.The results of four models are compared via Heidke skill scores. As a result, the logistic regression model is recommended.
Melanoma risk prediction models
Directory of Open Access Journals (Sweden)
Nikolić Jelena
2014-01-01
Full Text Available Background/Aim. The lack of effective therapy for advanced stages of melanoma emphasizes the importance of preventive measures and screenings of population at risk. Identifying individuals at high risk should allow targeted screenings and follow-up involving those who would benefit most. The aim of this study was to identify most significant factors for melanoma prediction in our population and to create prognostic models for identification and differentiation of individuals at risk. Methods. This case-control study included 697 participants (341 patients and 356 controls that underwent extensive interview and skin examination in order to check risk factors for melanoma. Pairwise univariate statistical comparison was used for the coarse selection of the most significant risk factors. These factors were fed into logistic regression (LR and alternating decision trees (ADT prognostic models that were assessed for their usefulness in identification of patients at risk to develop melanoma. Validation of the LR model was done by Hosmer and Lemeshow test, whereas the ADT was validated by 10-fold cross-validation. The achieved sensitivity, specificity, accuracy and AUC for both models were calculated. The melanoma risk score (MRS based on the outcome of the LR model was presented. Results. The LR model showed that the following risk factors were associated with melanoma: sunbeds (OR = 4.018; 95% CI 1.724- 9.366 for those that sometimes used sunbeds, solar damage of the skin (OR = 8.274; 95% CI 2.661-25.730 for those with severe solar damage, hair color (OR = 3.222; 95% CI 1.984-5.231 for light brown/blond hair, the number of common naevi (over 100 naevi had OR = 3.57; 95% CI 1.427-8.931, the number of dysplastic naevi (from 1 to 10 dysplastic naevi OR was 2.672; 95% CI 1.572-4.540; for more than 10 naevi OR was 6.487; 95%; CI 1.993-21.119, Fitzpatricks phototype and the presence of congenital naevi. Red hair, phototype I and large congenital naevi were
Madadgar, Shahrbanou; AghaKouchak, Amir; Shukla, Shraddhanand; Wood, Andrew W.; Cheng, Linyin; Hsu, Kou-Lin; Svoboda, Mark
2016-07-01
Improving water management in water stressed-regions requires reliable seasonal precipitation predication, which remains a grand challenge. Numerous statistical and dynamical model simulations have been developed for predicting precipitation. However, both types of models offer limited seasonal predictability. This study outlines a hybrid statistical-dynamical modeling framework for predicting seasonal precipitation. The dynamical component relies on the physically based North American Multi-Model Ensemble (NMME) model simulations (99 ensemble members). The statistical component relies on a multivariate Bayesian-based model that relates precipitation to atmosphere-ocean teleconnections (also known as an analog-year statistical model). Here the Pacific Decadal Oscillation (PDO), Multivariate ENSO Index (MEI), and Atlantic Multidecadal Oscillation (AMO) are used in the statistical component. The dynamical and statistical predictions are linked using the so-called Expert Advice algorithm, which offers an ensemble response (as an alternative to the ensemble mean). The latter part leads to the best precipitation prediction based on contributing statistical and dynamical ensembles. It combines the strength of physically based dynamical simulations and the capability of an analog-year model. An application of the framework in the southwestern United States, which has suffered from major droughts over the past decade, improves seasonal precipitation predictions (3-5 month lead time) by 5-60% relative to the NMME simulations. Overall, the hybrid framework performs better in predicting negative precipitation anomalies (10-60% improvement over NMME) than positive precipitation anomalies (5-25% improvement over NMME). The results indicate that the framework would likely improve our ability to predict droughts such as the 2012-2014 event in the western United States that resulted in significant socioeconomic impacts.
Statistical Model of the 3-D Braided Composites Strength
Institute of Scientific and Technical Information of China (English)
XIAO Laiyuan; ZUO Weiwei; CAI Ganwei; LIAO Daoxun
2007-01-01
Based on the statistical model for the tensile statistical strength of unidirectional composite materials and the stress analysis of 3-D braided composites, a new method is proposed to calculate the tensile statistical strength of the 3-D braided composites. With this method, the strength of 3-D braided composites can be calculated with very large accuracy, and the statistical parameters of 3-D braided composites can be determined. The numerical result shows that the tensile statistical strength of 3-D braided composites can be predicted using this method.
Seasonal drought predictability in Portugal using statistical-dynamical techniques
Ribeiro, A. F. S.; Pires, C. A. L.
2016-08-01
Atmospheric forecasting and predictability are important to promote adaption and mitigation measures in order to minimize drought impacts. This study estimates hybrid (statistical-dynamical) long-range forecasts of the regional drought index SPI (3-months) over homogeneous regions from mainland Portugal, based on forecasts from the UKMO operational forecasting system, with lead-times up to 6 months. ERA-Interim reanalysis data is used for the purpose of building a set of SPI predictors integrating recent past information prior to the forecast launching. Then, the advantage of combining predictors with both dynamical and statistical background in the prediction of drought conditions at different lags is evaluated. A two-step hybridization procedure is performed, in which both forecasted and observed 500 hPa geopotential height fields are subjected to a PCA in order to use forecasted PCs and persistent PCs as predictors. A second hybridization step consists on a statistical/hybrid downscaling to the regional SPI, based on regression techniques, after the pre-selection of the statistically significant predictors. The SPI forecasts and the added value of combining dynamical and statistical methods are evaluated in cross-validation mode, using the R2 and binary event scores. Results are obtained for the four seasons and it was found that winter is the most predictable season, and that most of the predictive power is on the large-scale fields from past observations. The hybridization improves the downscaling based on the forecasted PCs, since they provide complementary information (though modest) beyond that of persistent PCs. These findings provide clues about the predictability of the SPI, particularly in Portugal, and may contribute to the predictability of crops yields and to some guidance on users (such as farmers) decision making process.
Calculation of precise firing statistics in a neural network model
Cho, Myoung Won
2017-08-01
A precise prediction of neural firing dynamics is requisite to understand the function of and the learning process in a biological neural network which works depending on exact spike timings. Basically, the prediction of firing statistics is a delicate manybody problem because the firing probability of a neuron at a time is determined by the summation over all effects from past firing states. A neural network model with the Feynman path integral formulation is recently introduced. In this paper, we present several methods to calculate firing statistics in the model. We apply the methods to some cases and compare the theoretical predictions with simulation results.
Energy Technology Data Exchange (ETDEWEB)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au [School of Physics, University of Western Australia, Western Australia 6009, Australia and School of Health Sciences, National University of Malaysia, Bangi 43600 (Malaysia); Ebert, Martin A. [School of Physics, University of Western Australia, Western Australia 6009, Australia and Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Bulsara, Max [Institute for Health Research, University of Notre Dame, Fremantle, Western Australia 6959 (Australia); House, Michael J. [School of Physics, University of Western Australia, Western Australia 6009 (Australia); Kennedy, Angel [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Joseph, David J. [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008, Australia and School of Surgery, University of Western Australia, Western Australia 6009 (Australia); Denham, James W. [School of Medicine and Public Health, University of Newcastle, New South Wales 2308 (Australia)
2016-05-15
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions
Individual Differences in Statistical Learning Predict Children's Comprehension of Syntax
Kidd, Evan; Arciuli, Joanne
2016-01-01
Variability in children's language acquisition is likely due to a number of cognitive and social variables. The current study investigated whether individual differences in statistical learning (SL), which has been implicated in language acquisition, independently predicted 6- to 8-year-old's comprehension of syntax. Sixty-eight (N = 68)…
Manning, Robert M.
1991-01-01
The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.
Madelung rule violation statistics and superheavy elements electron shell prediction
Loza, E
2012-01-01
The paper presents tetrahedron periodic table to conveniently include superheavy elements. Madelung rule violation statistics is discussed and a model for Madelung rule violation probability calculation is proposed. On its basis superheavy elements probable electron shell structure is determined.
A new learning statistic for adaptive filter based on predicted residuals
Institute of Scientific and Technical Information of China (English)
YANG Yuanxi; GAO Weiguang
2006-01-01
A key problem for an adaptive filter is to establish a suitable adaptive factor for balancing the contributions of the measurements and the predicted state information from some kinematic models. The reasonable adaptive factor needs a reliable learning statistics to judge the state kinematic model errors. After analyzing the existing two kinds of learning statistics based on the state discrepancy and variance component ratio, a new learning statistic based on predicted residuals is set up, which is different from the exiting learning statistics. The new learning statistic does not need to estimate the kinemetic state parameters before the filtering process, Of course, it does not need necessary measurements to estimate state parameters for all observation epochs. The new learning statistic can be applied together with the learning factor constructed by the state discrepancy. The advantages and shortcomings of the new learning factor are analyzed, and an example is given.
Statistical modelling of fish stocks
DEFF Research Database (Denmark)
Kvist, Trine
1999-01-01
for modelling the dynamics of a fish population is suggested. A new approach is introduced to analyse the sources of variation in age composition data, which is one of the most important sources of information in the cohort based models for estimation of stock abundancies and mortalities. The approach combines...... and it is argued that an approach utilising stochastic differential equations might be advantagous in fish stoch assessments....
Statistical modelling of fish stocks
DEFF Research Database (Denmark)
Kvist, Trine
1999-01-01
for modelling the dynamics of a fish population is suggested. A new approach is introduced to analyse the sources of variation in age composition data, which is one of the most important sources of information in the cohort based models for estimation of stock abundancies and mortalities. The approach combines...... and it is argued that an approach utilising stochastic differential equations might be advantagous in fish stoch assessments....
Statistical modelling for ship propulsion efficiency
DEFF Research Database (Denmark)
Petersen, Jóan Petur; Jacobsen, Daniel J.; Winther, Ole
2012-01-01
This paper presents a state-of-the-art systems approach to statistical modelling of fuel efficiency in ship propulsion, and also a novel and publicly available data set of high quality sensory data. Two statistical model approaches are investigated and compared: artificial neural networks...
An Order Statistics Approach to the Halo Model for Galaxies
Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.
2017-01-01
We use the Halo Model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the `central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the Lognormal distribution around this mean, and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering, however, this model predicts no luminosity dependence of large scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically under-predicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the Halo Model for galaxies with more physically motivated galaxy formation models.
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
A statistical model for the excitation of cavities through apertures
Gradoni, Gabriele; Anlage, Steven M; Ott, Edward
2015-01-01
In this paper, a statistical model for the coupling of electromagnetic radiation into enclosures through apertures is presented. The model gives a unified picture bridging deterministic theories of aperture radiation, and statistical models necessary for capturing the properties of irregular shaped enclosures. A Monte Carlo technique based on random matrix theory is used to predict and study the power transmitted through the aperture into the enclosure. Universal behavior of the net power entering the aperture is found. Results are of interest for predicting the coupling of external radiation through openings in irregular enclosures and reverberation chambers.
Statistical learning for predictive targeting in online advertising
DEFF Research Database (Denmark)
Fruergaard, Bjarne Ørum
. The particular use-case which is used as a benchmark for our results, is clickthrough rate prediction. In this task one aims to predict the probability that a user will click on an advertisement, based on attributes about the user, the advertisement the context, and other signals, such as time. This has its main...... probabilities of clicks, which is a methodological extension of the current model in production at Adform. Our findings confirm that latent features can increase predictive performance in the setup of click-through rate prediction. They also reveal a tedious process for tuning the model for optimal performance...
Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes
Williams Colin P.
1999-01-01
Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.
Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita
2016-10-11
We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix
Integrated statistical modelling of spatial landslide probability
Mergili, M.; Chu, H.-J.
2015-09-01
Statistical methods are commonly employed to estimate spatial probabilities of landslide release at the catchment or regional scale. Travel distances and impact areas are often computed by means of conceptual mass point models. The present work introduces a fully automated procedure extending and combining both concepts to compute an integrated spatial landslide probability: (i) the landslide inventory is subset into release and deposition zones. (ii) We employ a simple statistical approach to estimate the pixel-based landslide release probability. (iii) We use the cumulative probability density function of the angle of reach of the observed landslide pixels to assign an impact probability to each pixel. (iv) We introduce the zonal probability i.e. the spatial probability that at least one landslide pixel occurs within a zone of defined size. We quantify this relationship by a set of empirical curves. (v) The integrated spatial landslide probability is defined as the maximum of the release probability and the product of the impact probability and the zonal release probability relevant for each pixel. We demonstrate the approach with a 637 km2 study area in southern Taiwan, using an inventory of 1399 landslides triggered by the typhoon Morakot in 2009. We observe that (i) the average integrated spatial landslide probability over the entire study area corresponds reasonably well to the fraction of the observed landside area; (ii) the model performs moderately well in predicting the observed spatial landslide distribution; (iii) the size of the release zone (or any other zone of spatial aggregation) influences the integrated spatial landslide probability to a much higher degree than the pixel-based release probability; (iv) removing the largest landslides from the analysis leads to an enhanced model performance.
Statistical Modeling of Bivariate Data.
1982-08-01
end identify by lock nsum br) joint density-quantile function, dependence-density, non-parametric bivariate density estimation, entropy , exponential...estimated, by autoregressive or exponential model estimators I with maximum entropy properties, is investigated in this thesis. The results provide...important and useful procedures for nonparametric bivariate density estimation. The thesis discusses estimators of the entropy H(d) of ul2) which seem to me
Melanoma Risk Prediction Models
Developing statistical models that estimate the probability of developing melanoma cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Improving the statistical reliability of stream heat assimilation prediction. Final report
Energy Technology Data Exchange (ETDEWEB)
McLay, R.W.; Hundal, M.S.; Lamborn, K.R.
1975-06-01
In response to an increased interest in water quality by the public, a large effort has been mounted to develop mathematical models for predicting heat assimilation in bodies of water. The accuracy of these models has recently come under scrutiny due to the need for temperature predictions within 1C of the ambient. This work is an evaluation of existing, one-dimensional stream temperature prediction techniques for accuracy and precision. The approach is through error estimates on a general model that encompasses all of the models presently used. A sensitivity analysis of this general model is used in conjunction with statistical methods to determine the solution errors. (GRA)
Directory of Open Access Journals (Sweden)
S. M. P. McKenna-Lawlor
2012-02-01
Full Text Available The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2 numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50%. This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50%, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter "Probability of Detection, yes" (PODy which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed, yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The statistical
On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology.
Directory of Open Access Journals (Sweden)
Paul B Conn
Full Text Available Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook's notion of an independent variable hull (IVH, developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models.
2016-05-31
Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded
Modelling diversity in building occupant behaviour: a novel statistical approach
DEFF Research Database (Denmark)
Haldi, Frédéric; Calì, Davide; Andersen, Rune Korsholm
2016-01-01
We propose an advanced modelling framework to predict the scope and effects of behavioural diversity regarding building occupant actions on window openings, shading devices and lighting. We develop a statistical approach based on generalised linear mixed models to account for the longitudinal nat...
Uncertainty the soul of modeling, probability & statistics
Briggs, William
2016-01-01
This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...
Statistical Model-Based Face Pose Estimation
Institute of Scientific and Technical Information of China (English)
GE Xinliang; YANG Jie; LI Feng; WANG Huahua
2007-01-01
A robust face pose estimation approach is proposed by using face shape statistical model approach and pose parameters are represented by trigonometric functions. The face shape statistical model is firstly built by analyzing the face shapes from different people under varying poses. The shape alignment is vital in the process of building the statistical model. Then, six trigonometric functions are employed to represent the face pose parameters. Lastly, the mapping function is constructed between face image and face pose by linearly relating different parameters. The proposed approach is able to estimate different face poses using a few face training samples. Experimental results are provided to demonstrate its efficiency and accuracy.
ZERODUR strength modeling with Weibull statistical distributions
Hartmann, Peter
2016-07-01
The decisive influence on breakage strength of brittle materials such as the low expansion glass ceramic ZERODUR is the surface condition. For polished or etched surfaces it is essential if micro cracks are present and how deep they are. Ground surfaces have many micro cracks caused by the generation process. Here only the depths of the micro cracks are relevant. In any case presence and depths of micro cracks are statistical by nature. The Weibull distribution is the model used traditionally for the representation of such data sets. It is based on the weakest link ansatz. The use of the two or three parameter Weibull distribution for data representation and reliability prediction depends on the underlying crack generation mechanisms. Before choosing the model for a specific evaluation, some checks should be done. Is there only one mechanism present or is it to be expected that an additional mechanism might contribute deviating results? For ground surfaces the main mechanism is the diamond grains' action on the surface. However, grains breaking from their bonding might be moved by the tool across the surface introducing a slightly deeper crack. It is not to be expected that these scratches follow the same statistical distribution as the grinding process. Hence, their description with the same distribution parameters is not adequate. Before including them a dedicated discussion should be performed. If there is additional information available influencing the selection of the model, for example the existence of a maximum crack depth, this should be taken into account also. Micro cracks introduced by small diamond grains on tools working with limited forces cannot be arbitrarily deep. For data obtained with such surfaces the existence of a threshold breakage stress should be part of the hypothesis. This leads to the use of the three parameter Weibull distribution. A differentiation based on the data set alone without preexisting information is possible but requires a
Accelerated life models modeling and statistical analysis
Bagdonavicius, Vilijandas
2001-01-01
Failure Time DistributionsIntroductionParametric Classes of Failure Time DistributionsAccelerated Life ModelsIntroductionGeneralized Sedyakin's ModelAccelerated Failure Time ModelProportional Hazards ModelGeneralized Proportional Hazards ModelsGeneralized Additive and Additive-Multiplicative Hazards ModelsChanging Shape and Scale ModelsGeneralizationsModels Including Switch-Up and Cycling EffectsHeredity HypothesisSummaryAccelerated Degradation ModelsIntroductionDegradation ModelsModeling the Influence of Explanatory Varia
Multistructure Statistical Model Applied To Factor Analysis
Bentler, Peter M.
1976-01-01
A general statistical model for the multivariate analysis of mean and covariance structures is described. Matrix calculus is used to develop the statistical aspects of one new special case in detail. This special case separates the confounding of principal components and factor analysis. (DEP)
Topology for statistical modeling of petascale data.
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Mixed deterministic statistical modelling of regional ozone air pollution
Kalenderski, Stoitchko Dimitrov
2011-03-17
We develop a physically motivated statistical model for regional ozone air pollution by separating the ground-level pollutant concentration field into three components, namely: transport, local production and large-scale mean trend mostly dominated by emission rates. The model is novel in the field of environmental spatial statistics in that it is a combined deterministic-statistical model, which gives a new perspective to the modelling of air pollution. The model is presented in a Bayesian hierarchical formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate that the model vastly outperforms existing, simpler modelling approaches. Our study highlights the importance of simultaneously considering different aspects of an air pollution problem as well as taking into account the physical bases that govern the processes of interest. © 2011 John Wiley & Sons, Ltd..
MODEL PREDICTIVE CONTROL FUNDAMENTALS
African Journals Online (AJOL)
2012-07-02
Jul 2, 2012 ... paper, we will present an introduction to the theory and application of MPC with Matlab codes written to ... model predictive control, linear systems, discrete-time systems, ... and then compute very rapidly for this open-loop con-.
Semantic Importance Sampling for Statistical Model Checking
2015-01-16
approach called Statistical Model Checking (SMC) [16], which relies on Monte - Carlo -based simulations to solve this verification task more scalably...Conclusion Statistical model checking (SMC) is a prominent approach for rigorous analysis of stochastic systems using Monte - Carlo simulations. In this... Monte - Carlo simulations, for computing the bounded probability that a specific event occurs during a stochastic system’s execution. Estimating the
Probabilistic Quantitative Precipitation Forecasting Using Ensemble Model Output Statistics
Scheuerer, Michael
2013-01-01
Statistical post-processing of dynamical forecast ensembles is an essential component of weather forecasting. In this article, we present a post-processing method that generates full predictive probability distributions for precipitation accumulations based on ensemble model output statistics (EMOS). We model precipitation amounts by a generalized extreme value distribution that is left-censored at zero. This distribution permits modelling precipitation on the original scale without prior transformation of the data. A closed form expression for its continuous rank probability score can be derived and permits computationally efficient model fitting. We discuss an extension of our approach that incorporates further statistics characterizing the spatial variability of precipitation amounts in the vicinity of the location of interest. The proposed EMOS method is applied to daily 18-h forecasts of 6-h accumulated precipitation over Germany in 2011 using the COSMO-DE ensemble prediction system operated by the Germa...
Statistical prediction of biomethane potentials based on the composition of lignocellulosic biomass
DEFF Research Database (Denmark)
Thomsen, Sune Tjalfe; Spliid, Henrik; Østergård, Hanne
2014-01-01
Mixture models are introduced as a new and stronger methodology for statistical prediction of biomethane potentials (BPM) from lignocellulosic biomass compared to the linear regression models previously used. A large dataset from literature combined with our own data were analysed using canonical...
Infinite Random Graphs as Statistical Mechanical Models
DEFF Research Database (Denmark)
Durhuus, Bergfinnur Jøgvan; Napolitano, George Maria
2011-01-01
We discuss two examples of infinite random graphs obtained as limits of finite statistical mechanical systems: a model of two-dimensional dis-cretized quantum gravity defined in terms of causal triangulated surfaces, and the Ising model on generic random trees. For the former model we describe...
Probability and Statistics in Sensor Performance Modeling
2010-12-01
transformed Rice- Nakagami distribution ......................................................................... 49 Report Documentation Page...acoustic or electromagnetic waves are scattered by both objects and turbulent wind. A version of the Rice- Nakagami model (specifically with a...Gaussian, lognormal, exponential, gamma, and the 2XX → transformed Rice- Nakagami —as well as a discrete model. (Other examples of statistical models
Statistical physics of pairwise probability models
DEFF Research Database (Denmark)
Roudi, Yasser; Aurell, Erik; Hertz, John
2009-01-01
(dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data...
Matrix Tricks for Linear Statistical Models
Puntanen, Simo; Styan, George PH
2011-01-01
In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and
Distributions with given marginals and statistical modelling
Fortiana, Josep; Rodriguez-Lallena, José
2002-01-01
This book contains a selection of the papers presented at the meeting `Distributions with given marginals and statistical modelling', held in Barcelona (Spain), July 17-20, 2000. In 24 chapters, this book covers topics such as the theory of copulas and quasi-copulas, the theory and compatibility of distributions, models for survival distributions and other well-known distributions, time series, categorical models, definition and estimation of measures of dependence, monotonicity and stochastic ordering, shape and separability of distributions, hidden truncation models, diagonal families, orthogonal expansions, tests of independence, and goodness of fit assessment. These topics share the use and properties of distributions with given marginals, this being the fourth specialised text on this theme. The innovative aspect of the book is the inclusion of statistical aspects such as modelling, Bayesian statistics, estimation, and tests.
Probably not future prediction using probability and statistical inference
Dworsky, Lawrence N
2008-01-01
An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...
The estimation of yearly probability gain for seismic statistical model
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
Based on the calculation method of information gain in the stochastic process presented by Vere-Jones, the relation between information gain and probability gain is studied, which is very common in earthquake prediction, and the yearly probability gain for seismic statistical model is proposed. The method is applied to the non-stationary Poisson model with whole-process exponential increase and stress release model. In addition, the prediction method of stress release model is obtained based on the inverse function simulation method of stochastic variable.
Performance modeling, loss networks, and statistical multiplexing
Mazumdar, Ravi
2009-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I
Statistical Modeling for Radiation Hardness Assurance
Ladbury, Raymond L.
2014-01-01
We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
Nominal model predictive control
Grüne, Lars
2013-01-01
5 p., to appear in Encyclopedia of Systems and Control, Tariq Samad, John Baillieul (eds.); International audience; Model Predictive Control is a controller design method which synthesizes a sampled data feedback controller from the iterative solution of open loop optimal control problems.We describe the basic functionality of MPC controllers, their properties regarding feasibility, stability and performance and the assumptions needed in order to rigorously ensure these properties in a nomina...
Nominal Model Predictive Control
Grüne, Lars
2014-01-01
5 p., to appear in Encyclopedia of Systems and Control, Tariq Samad, John Baillieul (eds.); International audience; Model Predictive Control is a controller design method which synthesizes a sampled data feedback controller from the iterative solution of open loop optimal control problems.We describe the basic functionality of MPC controllers, their properties regarding feasibility, stability and performance and the assumptions needed in order to rigorously ensure these properties in a nomina...
Advances in statistical models for data analysis
Minerva, Tommaso; Vichi, Maurizio
2015-01-01
This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.
Statistical Model Checking for Stochastic Hybrid Systems
DEFF Research Database (Denmark)
David, Alexandre; Du, Dehui; Larsen, Kim Guldstrand
2012-01-01
This paper presents novel extensions and applications of the UPPAAL-SMC model checker. The extensions allow for statistical model checking of stochastic hybrid systems. We show how our race-based stochastic semantics extends to networks of hybrid systems, and indicate the integration technique ap...
Dielectronic recombination rate in statistical model
Demura A.V.; Leontyev D.S.; Lisitsa V.S.; Shurigyn V.A.
2017-01-01
The dielectronic recombination rate of multielectron ions was calculated by means of the statistical approach. It is based on an idea of collective excitations of atomic electrons with the local plasma frequencies. These frequencies are expressed via the Thomas-Fermi model electron density distribution. The statistical approach provides fast computation of DR rates that are compared with the modern quantum mechanical calculations. The results are important for current studies of thermonuclear...
Dielectronic recombination rate in statistical model
Directory of Open Access Journals (Sweden)
Demura A.V.
2017-01-01
Full Text Available The dielectronic recombination rate of multielectron ions was calculated by means of the statistical approach. It is based on an idea of collective excitations of atomic electrons with the local plasma frequencies. These frequencies are expressed via the Thomas-Fermi model electron density distribution. The statistical approach provides fast computation of DR rates that are compared with the modern quantum mechanical calculations. The results are important for current studies of thermonuclear plasmas with the tungsten impurities.
Dielectronic recombination rate in statistical model
Demura, A. V.; Leontyev, D. S.; Lisitsa, V. S.; Shurigyn, V. A.
2016-12-01
The dielectronic recombination rate of multielectron ions was calculated by means of the statistical approach. It is based on an idea of collective excitations of atomic electrons with the local plasma frequencies. These frequencies are expressed via the Thomas-Fermi model electron density distribution. The statistical approach provides fast computation of DR rates that are compared with the modern quantum mechanical calculations. The results are important for current studies of thermonuclear plasmas with the tungsten impurities.
A Headway to QoS on Traffic Prediction over VANETs using RRSCM Statistical Classifier
Directory of Open Access Journals (Sweden)
ISHTIAQUE MAHMOOD
2016-07-01
Full Text Available In this paper, a novel throughput measurement forecast model is recommended for VANETs. The model is based on a statistical technique adopted and deployed over a high speed IP network traffic. Network traffic would always experience more QoS (Quality of Service issues such as jitter, delay, packet loss and degradation due to very low bit rate codification too. Despite of all such dictated issues the traffic throughput is to be predicted with at most accuracy using a proposed multivariate analysis scheme represented as a RRSCM (Refined Regression Statistical Classifier Model that optimizes parting parameters. Henceforth, the focus is towards the measurement methodology that estimates the traffic parameters that triggers to predict the accurate traffic and extemporize the QoS for the end-users. Finally, the proposed RRSCM classification model?s end-results are compared with the ANN (Artificial Neural Network classification model to showcase its better act on the projected model
Applying the luminosity function statistics in the fireshell model
Rangel Lemos, L. J.; Bianco, C. L.; Ruffini, R.
2015-12-01
The luminosity function (LF) statistics applied to the data of BATSE, GBM/Fermi and BAT/Swift is the theme approached in this work. The LF is a strong statistical tool to extract useful information from astrophysical samples, and the key point of this statistical analysis is in the detector sensitivity, where we have performed careful analysis. We applied the tool of the LF statistics to three GRB classes predicted by the Fireshell model. We produced, by LF statistics, predicted distributions of: peak ux N(Fph pk), redshift N(z) and peak luminosity N(Lpk) for the three GRB classes predicted by Fireshell model; we also used three GRB rates. We looked for differences among the distributions, and in fact we found. We performed a comparison between the distributions predicted and observed (with and without redshifts), where we had to build a list with 217 GRBs with known redshifts. Our goal is transform the GRBs in a standard candle, where a alternative is find a correlation between the isotropic luminosity and the Band peak spectral energy (Liso - Epk).
Candidate Prediction Models and Methods
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Madsen, Henrik
2005-01-01
This document lists candidate prediction models for Work Package 3 (WP3) of the PSO-project called ``Intelligent wind power prediction systems'' (FU4101). The main focus is on the models transforming numerical weather predictions into predictions of power production. The document also outlines...... the possibilities w.r.t. different numerical weather predictions actually available to the project....
Can spatial statistical river temperature models be transferred between catchments?
Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.
2017-09-01
There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
2013-11-01
Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.
Statistical modelling of fine red wine production
María Rosa Castro; Marcelo Eduardo Echegaray; Rosa Ana Rodríguez; Stella Maris Udaquiola
2010-01-01
Producing wine is a very important economic activity in the province of San Juan in Argentina; it is therefore most important to predict production regarding the quantity of raw material needed. This work was aimed at obtaining a model relating kilograms of crushed grape to the litres of wine so produced. Such model will be used for predicting precise future values and confidence intervals for determined quantities of crushed grapes. Data from a vineyard in the province of San Juan was ...
Mesoscopic full counting statistics and exclusion models
Roche, P.-E.; Derrida, B.; Douçot, B.
2005-02-01
We calculate the distribution of current fluctuations in two simple exclusion models. Although these models are classical, we recover even for small systems such as a simple or a double barrier, the same distibution of current as given by traditional formalisms for quantum mesoscopic conductors. Due to their simplicity, the full counting statistics in exclusion models can be reduced to the calculation of the largest eigenvalue of a matrix, the size of which is the number of internal configurations of the system. As examples, we derive the shot noise power and higher order statistics of current fluctuations (skewness, full counting statistics, ....) of various conductors, including multiple barriers, diffusive islands between tunnel barriers and diffusive media. A special attention is dedicated to the third cumulant, which experimental measurability has been demonstrated lately.
Statistical tests for equal predictive ability across multiple forecasting methods
DEFF Research Database (Denmark)
Borup, Daniel; Thyrsgaard, Martin
as non-stationarity of the data. We introduce two finite-sample corrections, leading to good size and power properties. We also provide a two-step Model Confidence Set-type decision rule for ranking the forecasting methods into sets of indistinguishable conditional predictive ability, particularly...
Predictive Surface Complexation Modeling
Energy Technology Data Exchange (ETDEWEB)
Sverjensky, Dimitri A. [Johns Hopkins Univ., Baltimore, MD (United States). Dept. of Earth and Planetary Sciences
2016-11-29
Surface complexation plays an important role in the equilibria and kinetics of processes controlling the compositions of soilwaters and groundwaters, the fate of contaminants in groundwaters, and the subsurface storage of CO_{2} and nuclear waste. Over the last several decades, many dozens of individual experimental studies have addressed aspects of surface complexation that have contributed to an increased understanding of its role in natural systems. However, there has been no previous attempt to develop a model of surface complexation that can be used to link all the experimental studies in order to place them on a predictive basis. Overall, my research has successfully integrated the results of the work of many experimentalists published over several decades. For the first time in studies of the geochemistry of the mineral-water interface, a practical predictive capability for modeling has become available. The predictive correlations developed in my research now enable extrapolations of experimental studies to provide estimates of surface chemistry for systems not yet studied experimentally and for natural and anthropogenically perturbed systems.
Growth curve models and statistical diagnostics
Pan, Jian-Xin
2002-01-01
Growth-curve models are generalized multivariate analysis-of-variance models. These models are especially useful for investigating growth problems on short times in economics, biology, medical research, and epidemiology. This book systematically introduces the theory of the GCM with particular emphasis on their multivariate statistical diagnostics, which are based mainly on recent developments made by the authors and their collaborators. The authors provide complete proofs of theorems as well as practical data sets and MATLAB code.
Three Generative, Lexicalised Models for Statistical Parsing
Collins, M
1997-01-01
In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)
2017-03-23
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop
Morrison, Joseph H.
2010-01-01
A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.
Bayesian models a statistical primer for ecologists
Hobbs, N Thompson
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probabili
An R companion to linear statistical models
Hay-Jahans, Christopher
2011-01-01
Focusing on user-developed programming, An R Companion to Linear Statistical Models serves two audiences: those who are familiar with the theory and applications of linear statistical models and wish to learn or enhance their skills in R; and those who are enrolled in an R-based course on regression and analysis of variance. For those who have never used R, the book begins with a self-contained introduction to R that lays the foundation for later chapters.This book includes extensive and carefully explained examples of how to write programs using the R programming language. These examples cove
Statistical transmutation in doped quantum dimer models.
Lamas, C A; Ralko, A; Cabra, D C; Poilblanc, D; Pujol, P
2012-07-06
We prove a "statistical transmutation" symmetry of doped quantum dimer models on the square, triangular, and kagome lattices: the energy spectrum is invariant under a simultaneous change of statistics (i.e., bosonic into fermionic or vice versa) of the holes and of the signs of all the dimer resonance loops. This exact transformation enables us to define the duality equivalence between doped quantum dimer Hamiltonians and provides the analytic framework to analyze dynamical statistical transmutations. We investigate numerically the doping of the triangular quantum dimer model with special focus on the topological Z(2) dimer liquid. Doping leads to four (instead of two for the square lattice) inequivalent families of Hamiltonians. Competition between phase separation, superfluidity, supersolidity, and fermionic phases is investigated in the four families.
STATISTICAL MODELS OF REPRESENTING INTELLECTUAL CAPITAL
Directory of Open Access Journals (Sweden)
Andreea Feraru
2016-07-01
Full Text Available This article entitled Statistical Models of Representing Intellectual Capital approaches and analyses the concept of intellectual capital, as well as the main models which can support enterprisers/managers in evaluating and quantifying the advantages of intellectual capital. Most authors examine intellectual capital from a static perspective and focus on the development of its various evaluation models. In this chapter we surveyed the classical static models: Sveiby, Edvisson, Balanced Scorecard, as well as the canonical model of intellectual capital. Among the group of static models for evaluating organisational intellectual capital the canonical model stands out. This model enables the structuring of organisational intellectual capital in: human capital, structural capital and relational capital. Although the model is widely spread, it is a static one and can thus create a series of errors in the process of evaluation, because all the three entities mentioned above are not independent from the viewpoint of their contents, as any logic of structuring complex entities requires.
Statistical Methods for Predicting Malaria Incidences Using Data from Sudan
Awadalla, Khidir E.
2017-01-01
Malaria is the leading cause of illness and death in Sudan. The entire population is at risk of malaria epidemics with a very high burden on government and population. The usefulness of forecasting methods in predicting the number of future incidences is needed to motivate the development of a system that can predict future incidences. The objective of this paper is to develop applicable and understood time series models and to find out what method can provide better performance to predict future incidences level. We used monthly incidence data collected from five states in Sudan with unstable malaria transmission. We test four methods of the forecast: (1) autoregressive integrated moving average (ARIMA); (2) exponential smoothing; (3) transformation model; and (4) moving average. The result showed that transformation method performed significantly better than the other methods for Gadaref, Gazira, North Kordofan, and Northern, while the moving average model performed significantly better for Khartoum. Future research should combine a number of different and dissimilar methods of time series to improve forecast accuracy with the ultimate aim of developing a simple and useful model for producing reasonably reliable forecasts of the malaria incidence in the study area.
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
Statistical Modeling Efforts for Headspace Gas
Energy Technology Data Exchange (ETDEWEB)
Weaver, Brian Phillip [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-03-17
The purpose of this document is to describe the statistical modeling effort for gas concentrations in WIPP storage containers. The concentration (in ppm) of CO_{2} in the headspace volume of standard waste box (SWB) 68685 is shown. A Bayesian approach and an adaptive Metropolis-Hastings algorithm were used.
Nonperturbative approach to the modified statistical model
Energy Technology Data Exchange (ETDEWEB)
Magdy, M.A.; Bekmezci, A.; Sever, R. [Middle East Technical Univ., Ankara (Turkey)
1993-12-01
The modified form of the statistical model is used without making any perturbation. The mass spectra of the lowest S, P and D levels of the (Q{bar Q}) and the non-self-conjugate (Q{bar q}) mesons are studied with the Song-Lin potential. The authors results are in good agreement with the experimental and theoretical findings.
Statistical Model Checking for Stochastic Hybrid Systems
DEFF Research Database (Denmark)
David, Alexandre; Du, Dehui; Larsen, Kim Guldstrand
2012-01-01
This paper presents novel extensions and applications of the UPPAAL-SMC model checker. The extensions allow for statistical model checking of stochastic hybrid systems. We show how our race-based stochastic semantics extends to networks of hybrid systems, and indicate the integration technique...... applied for implementing this semantics in the UPPAAL-SMC simulation engine. We report on two applications of the resulting tool-set coming from systems biology and energy aware buildings....
Statistical modeling of space shuttle environmental data
Tubbs, J. D.; Brewer, D. W.
1983-01-01
Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.
Performance modeling, stochastic networks, and statistical multiplexing
Mazumdar, Ravi R
2013-01-01
This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of introducing an appropriate mathematical framework for modeling and analysis as well as understanding the phenomenon of statistical multiplexing. The models, techniques, and results presented form the core of traffic engineering methods used to design, control and allocate resources in communication networks.The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the importan
Statistical physical models of cellular motility
Banigan, Edward J.
Cellular motility is required for a wide range of biological behaviors and functions, and the topic poses a number of interesting physical questions. In this work, we construct and analyze models of various aspects of cellular motility using tools and ideas from statistical physics. We begin with a Brownian dynamics model for actin-polymerization-driven motility, which is responsible for cell crawling and "rocketing" motility of pathogens. Within this model, we explore the robustness of self-diffusiophoresis, which is a general mechanism of motility. Using this mechanism, an object such as a cell catalyzes a reaction that generates a steady-state concentration gradient that propels the object in a particular direction. We then apply these ideas to a model for depolymerization-driven motility during bacterial chromosome segregation. We find that depolymerization and protein-protein binding interactions alone are sufficient to robustly pull a chromosome, even against large loads. Next, we investigate how forces and kinetics interact during eukaryotic mitosis with a many-microtubule model. Microtubules exert forces on chromosomes, but since individual microtubules grow and shrink in a force-dependent way, these forces lead to bistable collective microtubule dynamics, which provides a mechanism for chromosome oscillations and microtubule-based tension sensing. Finally, we explore kinematic aspects of cell motility in the context of the immune system. We develop quantitative methods for analyzing cell migration statistics collected during imaging experiments. We find that during chronic infection in the brain, T cells run and pause stochastically, following the statistics of a generalized Levy walk. These statistics may contribute to immune function by mimicking an evolutionarily conserved efficient search strategy. Additionally, we find that naive T cells migrating in lymph nodes also obey non-Gaussian statistics. Altogether, our work demonstrates how physical
Statistical physics of pairwise probability models
Directory of Open Access Journals (Sweden)
Yasser Roudi
2009-11-01
Full Text Available Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models.
Candidate Prediction Models and Methods
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Madsen, Henrik
2005-01-01
This document lists candidate prediction models for Work Package 3 (WP3) of the PSO-project called ``Intelligent wind power prediction systems'' (FU4101). The main focus is on the models transforming numerical weather predictions into predictions of power production. The document also outlines...
Survival Predictions of Ceramic Crowns Using Statistical Fracture Mechanics.
Nasrin, S; Katsube, N; Seghi, R R; Rokhlin, S I
2017-01-01
This work establishes a survival probability methodology for interface-initiated fatigue failures of monolithic ceramic crowns under simulated masticatory loading. A complete 3-dimensional (3D) finite element analysis model of a minimally reduced molar crown was developed using commercially available hardware and software. Estimates of material surface flaw distributions and fatigue parameters for 3 reinforced glass-ceramics (fluormica [FM], leucite [LR], and lithium disilicate [LD]) and a dense sintered yttrium-stabilized zirconia (YZ) were obtained from the literature and incorporated into the model. Utilizing the proposed fracture mechanics-based model, crown survival probability as a function of loading cycles was obtained from simulations performed on the 4 ceramic materials utilizing identical crown geometries and loading conditions. The weaker ceramic materials (FM and LR) resulted in lower survival rates than the more recently developed higher-strength ceramic materials (LD and YZ). The simulated 10-y survival rate of crowns fabricated from YZ was only slightly better than those fabricated from LD. In addition, 2 of the model crown systems (FM and LD) were expanded to determine regional-dependent failure probabilities. This analysis predicted that the LD-based crowns were more likely to fail from fractures initiating from margin areas, whereas the FM-based crowns showed a slightly higher probability of failure from fractures initiating from the occlusal table below the contact areas. These 2 predicted fracture initiation locations have some agreement with reported fractographic analyses of failed crowns. In this model, we considered the maximum tensile stress tangential to the interfacial surface, as opposed to the more universally reported maximum principal stress, because it more directly impacts crack propagation. While the accuracy of these predictions needs to be experimentally verified, the model can provide a fundamental understanding of the
Equilibrium statistical mechanics of lattice models
Lavis, David A
2015-01-01
Most interesting and difficult problems in equilibrium statistical mechanics concern models which exhibit phase transitions. For graduate students and more experienced researchers this book provides an invaluable reference source of approximate and exact solutions for a comprehensive range of such models. Part I contains background material on classical thermodynamics and statistical mechanics, together with a classification and survey of lattice models. The geometry of phase transitions is described and scaling theory is used to introduce critical exponents and scaling laws. An introduction is given to finite-size scaling, conformal invariance and Schramm—Loewner evolution. Part II contains accounts of classical mean-field methods. The parallels between Landau expansions and catastrophe theory are discussed and Ginzburg—Landau theory is introduced. The extension of mean-field theory to higher-orders is explored using the Kikuchi—Hijmans—De Boer hierarchy of approximations. In Part III the use of alge...
Statistical Compressed Sensing of Gaussian Mixture Models
Yu, Guoshen
2011-01-01
A novel framework of compressed sensing, namely statistical compressed sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution, and achieving accurate reconstruction on average, is introduced. SCS based on Gaussian models is investigated in depth. For signals that follow a single Gaussian model, with Gaussian or Bernoulli sensing matrices of O(k) measurements, considerably smaller than the O(k log(N/k)) required by conventional CS based on sparse models, where N is the signal dimension, and with an optimal decoder implemented via linear filtering, significantly faster than the pursuit decoders applied in conventional CS, the error of SCS is shown tightly upper bounded by a constant times the best k-term approximation error, with overwhelming probability. The failure probability is also significantly smaller than that of conventional sparsity-oriented CS. Stronger yet simpler results further show that for any sensing matrix, the error of Gaussian SCS is u...
Numerical weather prediction model tuning via ensemble prediction system
Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.
2011-12-01
This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.
Energy Technology Data Exchange (ETDEWEB)
Heinrich, S
2006-07-01
Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)
Energy Technology Data Exchange (ETDEWEB)
Nedic, Vladimir, E-mail: vnedic@kg.ac.rs [Faculty of Philology and Arts, University of Kragujevac, Jovana Cvijića bb, 34000 Kragujevac (Serbia); Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs [Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac (Serbia); Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs [Faculty of Economics, University of Niš, Trg kralja Aleksandra Ujedinitelja, 18000 Niš (Serbia); Despotovic, Milan, E-mail: mdespotovic@kg.ac.rs [Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac (Serbia); Babic, Sasa, E-mail: babicsf@yahoo.com [College of Applied Mechanical Engineering, Trstenik (Serbia)
2014-11-15
Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.
How to Establish Clinical Prediction Models
Directory of Open Access Journals (Sweden)
Yong-ho Lee
2016-03-01
Full Text Available A clinical prediction model can be applied to several challenging clinical scenarios: screening high-risk individuals for asymptomatic disease, predicting future events such as disease or death, and assisting medical decision-making and health education. Despite the impact of clinical prediction models on practice, prediction modeling is a complex process requiring careful statistical analyses and sound clinical judgement. Although there is no definite consensus on the best methodology for model development and validation, a few recommendations and checklists have been proposed. In this review, we summarize five steps for developing and validating a clinical prediction model: preparation for establishing clinical prediction models; dataset selection; handling variables; model generation; and model evaluation and validation. We also review several studies that detail methods for developing clinical prediction models with comparable examples from real practice. After model development and vigorous validation in relevant settings, possibly with evaluation of utility/usability and fine-tuning, good models can be ready for the use in practice. We anticipate that this framework will revitalize the use of predictive or prognostic research in endocrinology, leading to active applications in real clinical practice.
Energy based prediction models for building acoustics
DEFF Research Database (Denmark)
Brunskog, Jonas
2012-01-01
In order to reach robust and simplified yet accurate prediction models, energy based principle are commonly used in many fields of acoustics, especially in building acoustics. This includes simple energy flow models, the framework of statistical energy analysis (SEA) as well as more elaborated...... principles as, e.g., wave intensity analysis (WIA). The European standards for building acoustic predictions, the EN 12354 series, are based on energy flow and SEA principles. In the present paper, different energy based prediction models are discussed and critically reviewed. Special attention is placed...
Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...
Physics-based statistical learning approach to mesoscopic model selection
Taverniers, Søren; Haut, Terry S.; Barros, Kipton; Alexander, Francis J.; Lookman, Turab
2015-11-01
In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Probing NWP model deficiencies by statistical postprocessing
DEFF Research Database (Denmark)
Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.
2016-01-01
numerical weather prediction (NWP) model generating global weather forecasts four times daily, with numerous users worldwide. The analysis is based on two years of hourly wind speed time series measured at three locations; offshore, in coastal and flat terrain, and inland in complex topography, respectively...
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Statistics, Computation, and Modeling in Cosmology
Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology
2017-01-01
Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and
Directory of Open Access Journals (Sweden)
E. L. Dmitrieva
2016-05-01
Full Text Available Basic peculiarities of nonlinear Kalman filtering algorithm applied to processing of interferometric signals are considered. Analytical estimates determining statistical characteristics of signal values prediction errors were obtained and analysis of errors histograms taking into account variations of different parameters of interferometric signal was carried out. Modeling of the signal prediction procedure with known fixed parameters and variable parameters of signal in the algorithm of nonlinear Kalman filtering was performed. Numerical estimates of prediction errors for interferometric signal values were obtained by formation and analysis of the errors histograms under the influence of additive noise and random variations of amplitude and frequency of interferometric signal. Nonlinear Kalman filter is shown to provide processing of signals with randomly variable parameters, however, it does not take into account directly the linearization error of harmonic function representing interferometric signal that is a filtering error source. The main drawback of the linear prediction consists in non-Gaussian statistics of prediction errors including cases of random deviations of signal amplitude and/or frequency. When implementing stochastic filtering of interferometric signals, it is reasonable to use prediction procedures based on local statistics of a signal and its parameters taken into account.
Exploiting linkage disequilibrium in statistical modelling in quantitative genomics
DEFF Research Database (Denmark)
Wang, Lei
Alleles at two loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Genomic prediction and gene mapping rely on the existence of LD between gentic markers and causul variants of complex traits. In the first part of the thesis, a novel method...... to quantify and visualize local variation in LD along chromosomes in describet, and applied to characterize LD patters at the local and genome-wide scale in three Danish pig breeds. In the second part, different ways of taking LD into account in genomic prediction models are studied. One approach is to use...... the recently proposed antedependence models, which treat neighbouring marker effects as correlated; another approach involves use of haplotype block information derived using the program Beagle. The overall conclusion is that taking LD information into account in genomic prediction models potentially improves...
Comparison of statistical and clinical predictions of functional outcome after ischemic stroke.
Directory of Open Access Journals (Sweden)
Douglas D Thompson
Full Text Available To determine whether the predictions of functional outcome after ischemic stroke made at the bedside using a doctor's clinical experience were more or less accurate than the predictions made by clinical prediction models (CPMs.A prospective cohort study of nine hundred and thirty one ischemic stroke patients recruited consecutively at the outpatient, inpatient and emergency departments of the Western General Hospital, Edinburgh between 2002 and 2005. Doctors made informal predictions of six month functional outcome on the Oxford Handicap Scale (OHS. Patients were followed up at six months with a validated postal questionnaire. For each patient we calculated the absolute predicted risk of death or dependence (OHS≥3 using five previously described CPMs. The specificity of a doctor's informal predictions of OHS≥3 at six months was good 0.96 (95% CI: 0.94 to 0.97 and similar to CPMs (range 0.94 to 0.96; however the sensitivity of both informal clinical predictions 0.44 (95% CI: 0.39 to 0.49 and clinical prediction models (range 0.38 to 0.45 was poor. The prediction of the level of disability after stroke was similar for informal clinical predictions (ordinal c-statistic 0.74 with 95% CI 0.72 to 0.76 and CPMs (range 0.69 to 0.75. No patient or clinician characteristic affected the accuracy of informal predictions, though predictions were more accurate in outpatients.CPMs are at least as good as informal clinical predictions in discriminating between good and bad functional outcome after ischemic stroke. The place of these models in clinical practice has yet to be determined.
New Statistical PDFs: Predictions and Tests up to LHC Energies
Soffer, Jacques
2016-01-01
The quantum statistical parton distributions approach proposed more than one decade ago is revisited by considering a larger set of recent and accurate Deep Inelastic Scattering experimental results. It enables us to improve the description of the data by means of a new determination of the parton distributions. This global next-to-leading order QCD analysis leads to a good description of several structure functions, involving unpolarized parton distributions and helicity distributions, in a broad range of $x$ and $Q^2$ and in terms of a rather small number of free parameters. There are several challenging issues, in particular the behavior of $\\bar d(x) / \\bar u(x)$ at large $x$, a possible large positive gluon helicity distribution, etc.. The predictions of this theoretical approach will be tested for single-jet production and charge asymmetry in $W^{\\pm}$ production in $\\bar p p$ and $p p$ collisions up to LHC energies, using recent data and also for forthcoming experimental results.
Statistical modelling for falls count data.
Ullah, Shahid; Finch, Caroline F; Day, Lesley
2010-03-01
Falls and their injury outcomes have count distributions that are highly skewed toward the right with clumping at zero, posing analytical challenges. Different modelling approaches have been used in the published literature to describe falls count distributions, often without consideration of the underlying statistical and modelling assumptions. This paper compares the use of modified Poisson and negative binomial (NB) models as alternatives to Poisson (P) regression, for the analysis of fall outcome counts. Four different count-based regression models (P, NB, zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB)) were each individually fitted to four separate fall count datasets from Australia, New Zealand and United States. The finite mixtures of P and NB regression models were also compared to the standard NB model. Both analytical (F, Vuong and bootstrap tests) and graphical approaches were used to select and compare models. Simulation studies assessed the size and power of each model fit. This study confirms that falls count distributions are over-dispersed, but not dispersed due to excess zero counts or heterogeneous population. Accordingly, the P model generally provided the poorest fit to all datasets. The fit improved significantly with NB and both zero-inflated models. The fit was also improved with the NB model, compared to finite mixtures of both P and NB regression models. Although there was little difference in fit between NB and ZINB models, in the interests of parsimony it is recommended that future studies involving modelling of falls count data routinely use the NB models in preference to the P or ZINB or finite mixture distribution. The fact that these conclusions apply across four separate datasets from four different samples of older people participating in studies of different methodology, adds strength to this general guiding principle.
Statistics of predictions with missing higher order corrections
Berthier, Laure
2016-01-01
Effective operators have been used extensively to understand small deviations from the Standard Model in the search for new physics. So far there has been no general method to fit for small parameters when higher order corrections in these parameters are present but unknown. We present a new technique that solves this problem, allowing for an exact p-value calculation under the assumption that higher order theoretical contributions can be treated as gaussian distributed random variables. The method we propose is general, and may be used in the analysis of any perturbative theoretical prediction, ie.~truncated power series. We illustrate this new method by performing a fit of the Standard Model Effective Field Theory parameters, which include eg.~anomalous gauge and four-fermion couplings.
Infinite Random Graphs as Statistical Mechanical Models
DEFF Research Database (Denmark)
Durhuus, Bergfinnur Jøgvan; Napolitano, George Maria
2011-01-01
We discuss two examples of infinite random graphs obtained as limits of finite statistical mechanical systems: a model of two-dimensional dis-cretized quantum gravity defined in terms of causal triangulated surfaces, and the Ising model on generic random trees. For the former model we describe...... a relation to the so-called uniform infinite tree and results on the Hausdorff and spectral dimension of two-dimensional space-time obtained in B. Durhuus, T. Jonsson, J.F. Wheater, J. Stat. Phys. 139, 859 (2010) are briefly outlined. For the latter we discuss results on the absence of spontaneous...... magnetization and argue that, in the generic case, the values of the Hausdorff and spectral dimension of the underlying infinite trees are not influenced by the coupling to an Ising model in a constant magnetic field (B. Durhuus, G.M. Napolitano, in preparation)...
A survey of statistical network models
Goldenberg, Anna; Fienberg, Stephen E; Airoldi, Edoardo M
2009-01-01
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry poin...
Physical-Statistical Model of Thermal Conductivity of Nanofluids
Directory of Open Access Journals (Sweden)
B. Usowicz
2014-01-01
Full Text Available A physical-statistical model for predicting the effective thermal conductivity of nanofluids is proposed. The volumetric unit of nanofluids in the model consists of solid, liquid, and gas particles and is treated as a system made up of regular geometric figures, spheres, filling the volumetric unit by layers. The model assumes that connections between layers of the spheres and between neighbouring spheres in the layer are represented by serial and parallel connections of thermal resistors, respectively. This model is expressed in terms of thermal resistance of nanoparticles and fluids and the multinomial distribution of particles in the nanofluids. The results for predicted and measured effective thermal conductivity of several nanofluids (Al2O3/ethylene glycol-based and Al2O3/water-based; CuO/ethylene glycol-based and CuO/water-based; and TiO2/ethylene glycol-based are presented. The physical-statistical model shows a reasonably good agreement with the experimental results and gives more accurate predictions for the effective thermal conductivity of nanofluids compared to existing classical models.
Statistical Modelling of the Soil Dielectric Constant
Usowicz, Boguslaw; Marczewski, Wojciech; Bogdan Usowicz, Jerzy; Lipiec, Jerzy
2010-05-01
The dielectric constant of soil is the physical property being very sensitive on water content. It funds several electrical measurement techniques for determining the water content by means of direct (TDR, FDR, and others related to effects of electrical conductance and/or capacitance) and indirect RS (Remote Sensing) methods. The work is devoted to a particular statistical manner of modelling the dielectric constant as the property accounting a wide range of specific soil composition, porosity, and mass density, within the unsaturated water content. Usually, similar models are determined for few particular soil types, and changing the soil type one needs switching the model on another type or to adjust it by parametrization of soil compounds. Therefore, it is difficult comparing and referring results between models. The presented model was developed for a generic representation of soil being a hypothetical mixture of spheres, each representing a soil fraction, in its proper phase state. The model generates a serial-parallel mesh of conductive and capacitive paths, which is analysed for a total conductive or capacitive property. The model was firstly developed to determine the thermal conductivity property, and now it is extended on the dielectric constant by analysing the capacitive mesh. The analysis is provided by statistical means obeying physical laws related to the serial-parallel branching of the representative electrical mesh. Physical relevance of the analysis is established electrically, but the definition of the electrical mesh is controlled statistically by parametrization of compound fractions, by determining the number of representative spheres per unitary volume per fraction, and by determining the number of fractions. That way the model is capable covering properties of nearly all possible soil types, all phase states within recognition of the Lorenz and Knudsen conditions. In effect the model allows on generating a hypothetical representative of
Understanding and forecasting polar stratospheric variability with statistical models
Directory of Open Access Journals (Sweden)
C. Blume
2012-02-01
Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely a multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, etc., to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. FEM-VARX and MLP even satisfactorily forecast the period from 2005 to 2011. However, internal variability remains that cannot be statistically forecasted, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a vortex breakdown in late January, early February 2012.
An Order Statistics Approach to the Halo Model for Galaxies
Paul, Niladri; Sheth, Ravi K
2016-01-01
We use the Halo Model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models -- one in which this luminosity function $p(L)$ is universal -- naturally produces a number of features associated with previous analyses based on the `central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the Lognormal distribution around this mean, and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering, however, this model predicts $\\textit{no}$ luminosity dependence of large scale clustering. We then show that an extended version of this model, based on the order statistics of a $\\textit{halo mass dependent}$ luminosity function $p(L|m)$, is in much better agreement with the clustering data as well as satellite luminosities, but systematically under-pre...
Botvina, A; Gupta, S Das; Mishustin, I
2008-01-01
The statistical multifragmentation model (SMM) has been widely used to explain experimental data of intermediate energy heavy ion collisions. A later entrant in the field is the canonical thermodynamic model (CTM) which is also being used to fit experimental data. The basic physics of both the models is the same, namely that fragments are produced according to their statistical weights in the available phase space. However, they are based on different statistical ensembles, and the methods of calculation are different: while the SMM uses Monte-Carlo simulations, the CTM solves recursion relations. In this paper we compare the predictions of the two models for a few representative cases.
Institute of Scientific and Technical Information of China (English)
赵宏旭; 吴甦
2012-01-01
为了提高预测复杂波动过程的能力,结合物理模型和统计方法建立了＂波动方程-Gauss过程＂模型。通过误差分析,波动方程的理论预测与实际数据的差值被分解为3个部分,并拟合为Gauss过程模型：外力和初边值条件偏移带来的误差拟合为正交预测因子的线性叠加;模型假设不成立、数值解收敛性等因素导致的误差拟合为Gauss过程项;测量误差拟合为白噪声。＂波动方程-Gauss过程＂模型的预测因子是波动过程的基函数组,作为波动的本征特性不受外界影响,体现了波动的物理机理。基于实验数据的预测效果检验说明模型的基函数组和Gauss过程项都显著提高了预测波动过程的能力。%A wave equation Gaussian process model was developed to describe complicated wave motion by integrating physical and statistical approaches. The errors between the theoretical solution of the wave equation and the observed data were modeled as the three parts of aGaussian process model. The errors caused by the external interference and the shift boundary and initial conditions were described by a group of orthogonal basis functions. The errors caused by the inadequate model assumptions and limited convergence of the numerical solution were modeled as a Gaussian process term. Measurement errors were modeled as white noise. The basis functions, as the model predictors, are the intrinsic characteristics of the wave motion. The model was validated using experimental data generated i＇rom a vibrating string. The results indicate that both the basis functions and the Gaussian process terms significantly improve the prediction accuracy.
Massive Predictive Modeling using Oracle R Enterprise
CERN. Geneva
2014-01-01
R is fast becoming the lingua franca for analyzing data via statistics, visualization, and predictive analytics. For enterprise-scale data, R users have three main concerns: scalability, performance, and production deployment. Oracle's R-based technologies - Oracle R Distribution, Oracle R Enterprise, Oracle R Connector for Hadoop, and the R package ROracle - address these concerns. In this talk, we introduce Oracle's R technologies, highlighting how each enables R users to achieve scalability and performance while making production deployment of R results a natural outcome of the data analyst/scientist efforts. The focus then turns to Oracle R Enterprise with code examples using the transparency layer and embedded R execution, targeting massive predictive modeling. One goal behind massive predictive modeling is to build models per entity, such as customers, zip codes, simulations, in an effort to understand behavior and tailor predictions at the entity level. Predictions...
Experiences with Statistical Methods for Wind Power Prediction
DEFF Research Database (Denmark)
Nielsen, Torben Skov; Madsen, Henrik; Tofting, John
1999-01-01
This paper describes a tool for predicting the power procution from wind turbines in an area / the Wind Power Prediction Tool (WPPT). The predictions are based on on-line measuremets of power production for a selected set of reference wind farms i the area as well as numerical weather predictions...
Electronic noise modeling in statistical iterative reconstruction.
Xu, Jingyan; Tsui, Benjamin M W
2009-06-01
We consider electronic noise modeling in tomographic image reconstruction when the measured signal is the sum of a Gaussian distributed electronic noise component and another random variable whose log-likelihood function satisfies a certain linearity condition. Examples of such likelihood functions include the Poisson distribution and an exponential dispersion (ED) model that can approximate the signal statistics in integration mode X-ray detectors. We formulate the image reconstruction problem as a maximum-likelihood estimation problem. Using an expectation-maximization approach, we demonstrate that a reconstruction algorithm can be obtained following a simple substitution rule from the one previously derived without electronic noise considerations. To illustrate the applicability of the substitution rule, we present examples of a fully iterative reconstruction algorithm and a sinogram smoothing algorithm both in transmission CT reconstruction when the measured signal contains additive electronic noise. Our simulation studies show the potential usefulness of accurate electronic noise modeling in low-dose CT applications.
Statistical model with a standard Γ distribution
Patriarca, Marco; Chakraborti, Anirban; Kaski, Kimmo
2004-07-01
We study a statistical model consisting of N basic units which interact with each other by exchanging a physical entity, according to a given microscopic random law, depending on a parameter λ . We focus on the equilibrium or stationary distribution of the entity exchanged and verify through numerical fitting of the simulation data that the final form of the equilibrium distribution is that of a standard Gamma distribution. The model can be interpreted as a simple closed economy in which economic agents trade money and a saving criterion is fixed by the saving propensity λ . Alternatively, from the nature of the equilibrium distribution, we show that the model can also be interpreted as a perfect gas at an effective temperature T(λ) , where particles exchange energy in a space with an effective dimension D(λ) .
Statistical model with a standard Gamma distribution
Chakraborti, Anirban; Patriarca, Marco
2005-03-01
We study a statistical model consisting of N basic units which interact with each other by exchanging a physical entity, according to a given microscopic random law, depending on a parameter λ. We focus on the equilibrium or stationary distribution of the entity exchanged and verify through numerical fitting of the simulation data that the final form of the equilibrium distribution is that of a standard Gamma distribution. The model can be interpreted as a simple closed economy in which economic agents trade money and a saving criterion is fixed by the saving propensity λ. Alternatively, from the nature of the equilibrium distribution, we show that the model can also be interpreted as a perfect gas at an effective temperature T (λ), where particles exchange energy in a space with an effective dimension D (λ).
Statistical Decision-Tree Models for Parsing
Magerman, D M
1995-01-01
Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing {$n$}-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall ...
Statistical Model Checking for Product Lines
DEFF Research Database (Denmark)
ter Beek, Maurice H.; Legay, Axel; Lluch Lafuente, Alberto
2016-01-01
average cost of products (in terms of the attributes of the products’ features) and the probability of features to be (un)installed at runtime. The product lines must be modelled in QFLan, which extends the probabilistic feature-oriented language PFLan with novel quantitative constraints among features......We report on the suitability of statistical model checking for the analysis of quantitative properties of product line models by an extended treatment of earlier work by the authors. The type of analysis that can be performed includes the likelihood of specific product behaviour, the expected...... and on behaviour and with advanced feature installation options. QFLan is a rich process-algebraic specification language whose operational behaviour interacts with a store of constraints, neatly separating product configuration from product behaviour. The resulting probabilistic configurations and probabilistic...
Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop
Morrison, Joseph H.
2013-01-01
A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.
Statistical Analysis of CFD Solutions from the 6th AIAA CFD Drag Prediction Workshop
Derlaga, Joseph M.; Morrison, Joseph H.
2017-01-01
A graphical framework is used for statistical analysis of the results from an extensive N- version test of a collection of Reynolds-averaged Navier-Stokes computational uid dynam- ics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using both common and custom grid sequencees as well as multiple turbulence models for the June 2016 6th AIAA CFD Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic con guration for this workshop was the Common Research Model subsonic transport wing- body previously used for both the 4th and 5th Drag Prediction Workshops. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.
Spatial Economics Model Predicting Transport Volume
Directory of Open Access Journals (Sweden)
Lu Bo
2016-10-01
Full Text Available It is extremely important to predict the logistics requirements in a scientific and rational way. However, in recent years, the improvement effect on the prediction method is not very significant and the traditional statistical prediction method has the defects of low precision and poor interpretation of the prediction model, which cannot only guarantee the generalization ability of the prediction model theoretically, but also cannot explain the models effectively. Therefore, in combination with the theories of the spatial economics, industrial economics, and neo-classical economics, taking city of Zhuanghe as the research object, the study identifies the leading industry that can produce a large number of cargoes, and further predicts the static logistics generation of the Zhuanghe and hinterlands. By integrating various factors that can affect the regional logistics requirements, this study established a logistics requirements potential model from the aspect of spatial economic principles, and expanded the way of logistics requirements prediction from the single statistical principles to an new area of special and regional economics.
Challenges in Dental Statistics: Data and Modelling
Directory of Open Access Journals (Sweden)
Domenica Matranga
2013-03-01
Full Text Available The aim of this work is to present the reflections and proposals derived from the first Workshop of the SISMEC STATDENT working group on statistical methods and applications in dentistry, held in Ancona (Italy on 28th September 2011. STATDENT began as a forum of comparison and discussion for statisticians working in the field of dental research in order to suggest new and improve existing biostatistical and clinical epidemiological methods. During the meeting, we dealt with very important topics of statistical methodology for the analysis of dental data, covering the analysis of hierarchically structured and over-dispersed data, the issue of calibration and reproducibility, as well as some problems related to survey methodology, such as the design and construction of unbiased statistical indicators and of well conducted clinical trials. This paper gathers some of the methodological topics discussed during the meeting, concerning multilevel and zero-inflated models for the analysis of caries data and methods for the training and calibration of raters in dental epidemiology.
Statistical Model Checking for Biological Systems
DEFF Research Database (Denmark)
David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel
2014-01-01
Statistical Model Checking (SMC) is a highly scalable simulation-based verification approach for testing and estimating the probability that a stochastic system satisfies a given linear temporal property. The technique has been applied to (discrete and continuous time) Markov chains, stochastic...... proved very useful for identifying interesting properties of biological systems. Our aim is to offer the best of the two worlds: optimal domain specific interfaces and formalisms suited to biology combined with powerful SMC analysis techniques for stochastic and hybrid systems. This goal is obtained...
Statistical shape and appearance models in osteoporosis.
Castro-Mateos, Isaac; Pozo, Jose M; Cootes, Timothy F; Wilkinson, J Mark; Eastell, Richard; Frangi, Alejandro F
2014-06-01
Statistical models (SMs) of shape (SSM) and appearance (SAM) have been acquiring popularity in medical image analysis since they were introduced in the early 1990s. They have been primarily used for segmentation, but they are also a powerful tool for 3D reconstruction and classification. All these tasks may be required in the osteoporosis domain, where fracture detection and risk estimation are key to reducing the mortality and/or morbidity of this bone disease. In this article, we review the different applications of SSMs and SAMs in the context of osteoporosis, and it concludes with a discussion of their advantages and disadvantages for this application.
A Statistical Model of Skewed Associativity
Michaud, Pierre
2002-01-01
This paper presents a statistical model of set-associativity, victim caching and skewed-associativity, with an emphasis on skewed-associativity. We show that set-associativity is not efficient when the working-set size is close to the cache size. We refer to this as the unit working-set problem. We show that victim-caching is not a practical solution to the unit working-se- t problem either, although victim caching emulates full associativity for working-sets much larger than the victim buffe...
Structural Characterization and Statistical-Mechanical Model of Epidermal Patterns.
Chen, Duyu; Aw, Wen Yih; Devenport, Danelle; Torquato, Salvatore
2016-12-06
In proliferating epithelia of mammalian skin, cells of irregular polygon-like shapes pack into complex, nearly flat two-dimensional structures that are pliable to deformations. In this work, we employ various sensitive correlation functions to quantitatively characterize structural features of evolving packings of epithelial cells across length scales in mouse skin. We find that the pair statistics in direct space (correlation function) and Fourier space (structure factor) of the cell centroids in the early stages of embryonic development show structural directional dependence (statistical anisotropy), which is a reflection of the fact that cells are stretched, which promotes uniaxial growth along the epithelial plane. In the late stages, the patterns tend toward statistically isotropic states, as cells attain global polarization and epidermal growth shifts to produce the skin's outer stratified layers. We construct a minimalist four-component statistical-mechanical model involving effective isotropic pair interactions consisting of hard-core repulsion and extra short-range soft-core repulsion beyond the hard core, whose length scale is roughly the same as the hard core. The model parameters are optimized to match the sample pair statistics in both direct and Fourier spaces. By doing this, the parameters are biologically constrained. In contrast with many vertex-based models, our statistical-mechanical model does not explicitly incorporate information about the cell shapes and interfacial energy between cells; nonetheless, our model predicts essentially the same polygonal shape distribution and size disparity of cells found in experiments, as measured by Voronoi statistics. Moreover, our simulated equilibrium liquid-like configurations are able to match other nontrivial unconstrained statistics, which is a testament to the power and novelty of the model. The array of structural descriptors that we deploy enable us to distinguish between normal, mechanically
PREDICTIVE CAPACITY OF ARCH FAMILY MODELS
Directory of Open Access Journals (Sweden)
Raphael Silveira Amaro
2016-03-01
Full Text Available In the last decades, a remarkable number of models, variants from the Autoregressive Conditional Heteroscedastic family, have been developed and empirically tested, making extremely complex the process of choosing a particular model. This research aim to compare the predictive capacity, using the Model Confidence Set procedure, than five conditional heteroskedasticity models, considering eight different statistical probability distributions. The financial series which were used refers to the log-return series of the Bovespa index and the Dow Jones Industrial Index in the period between 27 October 2008 and 30 December 2014. The empirical evidences showed that, in general, competing models have a great homogeneity to make predictions, either for a stock market of a developed country or for a stock market of a developing country. An equivalent result can be inferred for the statistical probability distributions that were used.
Predictive Modeling of Cardiac Ischemia
Anderson, Gary T.
1996-01-01
The goal of the Contextual Alarms Management System (CALMS) project is to develop sophisticated models to predict the onset of clinical cardiac ischemia before it occurs. The system will continuously monitor cardiac patients and set off an alarm when they appear about to suffer an ischemic episode. The models take as inputs information from patient history and combine it with continuously updated information extracted from blood pressure, oxygen saturation and ECG lines. Expert system, statistical, neural network and rough set methodologies are then used to forecast the onset of clinical ischemia before it transpires, thus allowing early intervention aimed at preventing morbid complications from occurring. The models will differ from previous attempts by including combinations of continuous and discrete inputs. A commercial medical instrumentation and software company has invested funds in the project with a goal of commercialization of the technology. The end product will be a system that analyzes physiologic parameters and produces an alarm when myocardial ischemia is present. If proven feasible, a CALMS-based system will be added to existing heart monitoring hardware.
Predicting Energy Performance of a Net-Zero Energy Building: A Statistical Approach.
Kneifel, Joshua; Webb, David
2016-09-01
Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid climate zone, and compares these estimates to the results from already existing EnergyPlus whole building energy simulations. This regression model exhibits agreement with EnergyPlus predictive trends in energy production and net consumption, but differs greatly in energy consumption. The model can be used as a framework for alternative and more complex models based on the
Statistical pairwise interaction model of stock market
Bury, Thomas
2013-03-01
Financial markets are a classical example of complex systems as they are compound by many interacting stocks. As such, we can obtain a surprisingly good description of their structure by making the rough simplification of binary daily returns. Spin glass models have been applied and gave some valuable results but at the price of restrictive assumptions on the market dynamics or they are agent-based models with rules designed in order to recover some empirical behaviors. Here we show that the pairwise model is actually a statistically consistent model with the observed first and second moments of the stocks orientation without making such restrictive assumptions. This is done with an approach only based on empirical data of price returns. Our data analysis of six major indices suggests that the actual interaction structure may be thought as an Ising model on a complex network with interaction strengths scaling as the inverse of the system size. This has potentially important implications since many properties of such a model are already known and some techniques of the spin glass theory can be straightforwardly applied. Typical behaviors, as multiple equilibria or metastable states, different characteristic time scales, spatial patterns, order-disorder, could find an explanation in this picture.
Projecting Policy Effects with Statistical Models Projecting Policy Effects with Statistical Models
Directory of Open Access Journals (Sweden)
Christopher Sims
1988-03-01
Full Text Available This paper attempts to briefly discus the current frontiers in quantitative modeling for forecastina and policy analvsis. It does so by summarizing some recent developmenrs in three areas: reduced form forecasting models; theoretical models including elements of stochastic optimization; and identification. In the process, the paper tries to provide some remarks on the direction we seem to be headed. Projecting Policy Effects with Statistical Models
Statistical modelling of fine red wine production
Directory of Open Access Journals (Sweden)
María Rosa Castro
2010-05-01
Full Text Available Producing wine is a very important economic activity in the province of San Juan in Argentina; it is therefore most important to predict production regarding the quantity of raw material needed. This work was aimed at obtaining a model relating kilograms of crushed grape to the litres of wine so produced. Such model will be used for predicting precise future values and confidence intervals for determined quantities of crushed grapes. Data from a vineyard in the province of San Juan was thus used in this work. The sampling coefficient of correlation was calculated and a dispersion diagram was then constructed; this indicated a li- neal relationship between the litres of wine obtained and the kilograms of crushed grape. Two lineal models were then adopted and variance analysis was carried out because the data came from normal populations having the same variance. The most appropriate model was obtained from this analysis; it was validated with experimental values, a good approach being obtained.
Hybrid perturbation methods based on statistical time series models
San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario
2016-04-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.
Statistical Mechanical Models of Integer Factorization Problem
Nakajima, Chihiro H.; Ohzeki, Masayuki
2017-01-01
We formulate the integer factorization problem via a formulation of the searching problem for the ground state of a statistical mechanical Hamiltonian. The first passage time required to find a correct divisor of a composite number signifies the exponential computational hardness. The analysis of the density of states of two macroscopic quantities, i.e., the energy and the Hamming distance from the correct solutions, leads to the conclusion that the ground state (correct solution) is completely isolated from the other low-energy states, with the distance being proportional to the system size. In addition, the profile of the microcanonical entropy of the model has two peculiar features that are each related to two marked changes in the energy region sampled via Monte Carlo simulation or simulated annealing. Hence, we find a peculiar first-order phase transition in our model.
Experiences with Statistical Methods for Wind Power Prediction
DEFF Research Database (Denmark)
Nielsen, Torben Skov; Madsen, Henrik; Tofting, John
1999-01-01
This paper describes a tool for predicting the power procution from wind turbines in an area / the Wind Power Prediction Tool (WPPT). The predictions are based on on-line measuremets of power production for a selected set of reference wind farms i the area as well as numerical weather predictions...... covering the locations of the reference wind farms. WPPT is in operational use in the Western part of Denmark and the utililties experiences with the tool is presented....
A statistical model of facial attractiveness.
Said, Christopher P; Todorov, Alexander
2011-09-01
Previous research has identified facial averageness and sexual dimorphism as important factors in facial attractiveness. The averageness and sexual dimorphism accounts provide important first steps in understanding what makes faces attractive, and should be valued for their parsimony. However, we show that they explain relatively little of the variance in facial attractiveness, particularly for male faces. As an alternative to these accounts, we built a regression model that defines attractiveness as a function of a face's position in a multidimensional face space. The model provides much more predictive power than the averageness and sexual dimorphism accounts and reveals previously unreported components of attractiveness. The model shows that averageness is attractive in some dimensions but not in others and resolves previous contradictory reports about the effects of sexual dimorphism on the attractiveness of male faces.
Paiement, Jean-François; Grandvalet, Yves; Bengio, Samy
2008-01-01
Modeling long-term dependencies in time series has proved very difficult to achieve with traditional machine learning methods. This problem occurs when considering music data. In this paper, we introduce generative models for melodies. We decompose melodic modeling into two subtasks. We first propose a rhythm model based on the distributions of distances between subsequences. Then, we define a generative model for melodies given chords and rhythms based on modeling sequences of Narmour featur...
Hybrid Perturbation methods based on Statistical Time Series models
San-Juan, Juan Félix; Pérez, Iván; López, Rosario
2016-01-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of a...
Real-Time Statistical Modeling of Blood Sugar.
Otoom, Mwaffaq; Alshraideh, Hussam; Almasaeid, Hisham M; López-de-Ipiña, Diego; Bravo, José
2015-10-01
Diabetes is considered a chronic disease that incurs various types of cost to the world. One major challenge in the control of Diabetes is the real time determination of the proper insulin dose. In this paper, we develop a prototype for real time blood sugar control, integrated with the cloud. Our system controls blood sugar by observing the blood sugar level and accordingly determining the appropriate insulin dose based on patient's historical data, all in real time and automatically. To determine the appropriate insulin dose, we propose two statistical models for modeling blood sugar profiles, namely ARIMA and Markov-based model. Our experiment used to evaluate the performance of the two models shows that the ARIMA model outperforms the Markov-based model in terms of prediction accuracy.
Parton distribution of nucleon and nuclear EMC effect in a statistical model
Yu, Xian-Qiao
2016-01-01
We study the parton distribution of nucleon and nuclear EMC effect in a statistical model. We find when we choose the parameters appropriately, the predictions given by pure statistical laws can fit the experimental data well in most range of $x$, this reveal statistical law play an important role in the parton distribution of nucleon.
MSMBuilder: Statistical Models for Biomolecular Dynamics.
Harrigan, Matthew P; Sultan, Mohammad M; Hernández, Carlos X; Husic, Brooke E; Eastman, Peter; Schwantes, Christian R; Beauchamp, Kyle A; McGibbon, Robert T; Pande, Vijay S
2017-01-10
MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
On Wiener filtering and the physics behind statistical modeling.
Marbach, Ralf
2002-01-01
The closed-form solution of the so-called statistical multivariate calibration model is given in terms of the pure component spectral signal, the spectral noise, and the signal and noise of the reference method. The "statistical" calibration model is shown to be as much grounded on the physics of the pure component spectra as any of the "physical" models. There are no fundamental differences between the two approaches since both are merely different attempts to realize the same basic idea, viz., the spectrometric Wiener filter. The concept of the application-specific signal-to-noise ratio (SNR) is introduced, which is a combination of the two SNRs from the reference and the spectral data. Both are defined and the central importance of the latter for the assessment and development of spectroscopic instruments and methods is explained. Other statistics like the correlation coefficient, prediction error, slope deficiency, etc., are functions of the SNR. Spurious correlations and other practically important issues are discussed in quantitative terms. Most important, it is shown how to use a priori information about the pure component spectra and the spectral noise in an optimal way, thereby making the distinction between statistical and physical calibrations obsolete and combining the best of both worlds. Companies and research groups can use this article to realize significant savings in cost and time for development efforts.
Chernyavskaya, Ekaterina A; Golden, Kenneth M; Timokhov, Leonid A
2014-01-01
Significant salinity anomalies have been observed in the Arctic Ocean surface layer during the last decade. Using gridded data of winter salinity in the upper 50 m layer of the Arctic Ocean for the period 1950-1993 and 2007-2012, we investigated the inter-annual variability of the salinity fields, attempted to identify patterns and anomalies, and developed a statistical model for the prediction of surface layer salinity. The statistical model is based on linear regression equations linking the principal components with environmental factors, such as atmospheric circulation, river runoff, ice processes, and water exchange with neighboring oceans. Using this model, we obtained prognostic fields of the surface layer salinity for the winter period 2013-2014. The prognostic fields demonstrated the same tendencies of surface layer freshening that were observed previously. A phase portrait analysis involving the first two principal components exhibits a dramatic shift in behavior of the 2007-2012 data in comparison ...
Directory of Open Access Journals (Sweden)
Abut F
2015-08-01
Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance
Statistical model for OCT image denoising
Li, Muxingzi
2017-08-01
Optical coherence tomography (OCT) is a non-invasive technique with a large array of applications in clinical imaging and biological tissue visualization. However, the presence of speckle noise affects the analysis of OCT images and their diagnostic utility. In this article, we introduce a new OCT denoising algorithm. The proposed method is founded on a numerical optimization framework based on maximum-a-posteriori estimate of the noise-free OCT image. It combines a novel speckle noise model, derived from local statistics of empirical spectral domain OCT (SD-OCT) data, with a Huber variant of total variation regularization for edge preservation. The proposed approach exhibits satisfying results in terms of speckle noise reduction as well as edge preservation, at reduced computational cost.
Physical and Statistical Modeling of Saturn's Troposphere
Yanamandra-Fisher, Padmavati A.; Braverman, Amy J.; Orton, Glenn S.
2002-12-01
The 5.2-μm atmospheric window on Saturn is dominated by thermal radiation and weak gaseous absorption, with a 20% contribution from sunlight reflected from clouds. The striking variability displayed by Saturn's clouds at 5.2 μm and the detection of PH3 (an atmospheric tracer) variability near or below the 2-bar level and possibly at lower pressures provide salient constraints on the dynamical organization of Saturn's atmosphere by constraining the strength of vertical motions at two levels across the disk. We analyse the 5.2-μm spectra of Saturn by utilising two independent methods: (a) physical models based on the relevant atmospheric parameters and (b) statistical analysis, based on principal components analysis (PCA), to determine the influence of the variation of phosphine and the opacity of clouds deep within Saturn's atmosphere to understand the dynamics in its atmosphere.
Zephyr - the prediction models
DEFF Research Database (Denmark)
Nielsen, Torben Skov; Madsen, Henrik; Nielsen, Henrik Aalborg
2001-01-01
This paper briefly describes new models and methods for predicationg the wind power output from wind farms. The system is being developed in a project which has the research organization Risø and the department of Informatics and Mathematical Modelling (IMM) as the modelling team and all the Dani...
New advances in statistical modeling and applications
Santos, Rui; Oliveira, Maria; Paulino, Carlos
2014-01-01
This volume presents selected papers from the XIXth Congress of the Portuguese Statistical Society, held in the town of Nazaré, Portugal, from September 28 to October 1, 2011. All contributions were selected after a thorough peer-review process. It covers a broad range of papers in the areas of statistical science, probability and stochastic processes, extremes and statistical applications.
ENSO Prediction using Vector Autoregressive Models
Chapman, D. R.; Cane, M. A.; Henderson, N.; Lee, D.; Chen, C.
2013-12-01
A recent comparison (Barnston et al, 2012 BAMS) shows the ENSO forecasting skill of dynamical models now exceeds that of statistical models, but the best statistical models are comparable to all but the very best dynamical models. In this comparison the leading statistical model is the one based on the Empirical Model Reduction (EMR) method. Here we report on experiments with multilevel Vector Autoregressive models using only sea surface temperatures (SSTs) as predictors. VAR(L) models generalizes Linear Inverse Models (LIM), which are a VAR(1) method, as well as multilevel univariate autoregressive models. Optimal forecast skill is achieved using 12 to 14 months of prior state information (i.e 12-14 levels), which allows SSTs alone to capture the effects of other variables such as heat content as well as seasonality. The use of multiple levels allows the model advancing one month at a time to perform at least as well for a 6 month forecast as a model constructed to explicitly forecast 6 months ahead. We infer that the multilevel model has fully captured the linear dynamics (cf. Penland and Magorian, 1993 J. Climate). Finally, while VAR(L) is equivalent to L-level EMR, we show in a 150 year cross validated assessment that we can increase forecast skill by improving on the EMR initialization procedure. The greatest benefit of this change is in allowing the prediction to make effective use of information over many more months.
Predicting sulphur and nitrogen deposition using a simple statistical method
Oulehle, Filip; Kopáček, Jiří; Chuman, Tomáš; Černohous, Vladimír; Hůnová, Iva; Hruška, Jakub; Krám, Pavel; Lachmanová, Zora; Navrátil, Tomáš; Štěpánek, Petr; Tesař, Miroslav; Evans, Christopher D.
2016-09-01
Data from 32 long-term (1994-2012) monitoring sites were used to assess temporal development and spatial variability of sulphur (S) and inorganic nitrogen (N) concentrations in bulk precipitation, and S in throughfall, for the Czech Republic. Despite large variance in absolute S and N concentration/deposition among sites, temporal coherence using standardised data (Z score) was demonstrated. Overall significant declines of SO4 concentration in bulk and throughfall precipitation, as well as NO3 and NH4 concentration in bulk precipitation, were observed. Median Z score values of bulk SO4, NO3 and NH4 and throughfall SO4 derived from observations and the respective emission rates of SO2, NOx and NH3 in the Czech Republic and Slovakia showed highly significant (p Z score values were calculated for the whole period 1900-2012 and then back-transformed to give estimates of concentration for the individual sites. Uncertainty associated with the concentration calculations was estimated as 20% for SO4 bulk precipitation, 22% for throughfall SO4, 18% for bulk NO3 and 28% for bulk NH4. The application of the method suggested that it is effective in the long-term reconstruction and prediction of S and N deposition at a variety of sites. Multiple regression modelling was used to extrapolate site characteristics (mean precipitation chemistry and its standard deviation) from monitored to unmonitored sites. Spatially distributed temporal development of S and N depositions were calculated since 1900. The method allows spatio-temporal estimation of the acid deposition in regions with extensive monitoring of precipitation chemistry.
Spatio-temporal statistical models with applications to atmospheric processes
Energy Technology Data Exchange (ETDEWEB)
Wikle, C.K.
1996-12-31
This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model.
Spatio-temporal statistical models with applications to atmospheric processes
Energy Technology Data Exchange (ETDEWEB)
Wikle, Christopher K. [Iowa State Univ., Ames, IA (United States)
1996-01-01
This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model.
Bard, D; Chang, C; May, M; Kahn, S M; AlSayyad, Y; Ahmad, Z; Bankert, J; Connolly, A; Gibson, R R; Gilmore, K; Grace, E; Haiman, Z; Hannel, M; Huffenberger, K M; Jernigan, J G; Jones, L; Krughoff, S; Lorenz, S; Marshall, S; Meert, A; Nagarajan, S; Peng, E; Peterson, J; Rasmussen, A P; Shmakova, M; Sylvestre, N; Todd, N; Young, M
2013-01-01
The statistics of peak counts in reconstructed shear maps contain information beyond the power spectrum, and can improve cosmological constraints from measurements of the power spectrum alone if systematic errors can be controlled. We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST image simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.
Confidence scores for prediction models
DEFF Research Database (Denmark)
Gerds, Thomas Alexander; van de Wiel, MA
2011-01-01
modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same repeated bootstraps. A new decomposition of the expected Brier score is obtained, as well as the estimates of population average confidence scores. The latter can be used...... to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer...
Modelling earthquake interaction and seismicity statistics
Steacy, S.; Hetherington, A.
2009-04-01
The effects of earthquake interaction and fault complexity on seismicity statistics are investigated in a 3D model composed of a number of cellular automata (each representing an individual fault) distributed in a volume. Each automaton is assigned a fractal distribution of strength. Failure occurs when the 3D Coulomb stress on any cell exceeds its strength and stress transfer during simulated earthquake rupture is via nearest-neighbor rules formulated to give realistic stress concentrations. An event continues until all neighboring cells whose stresses exceed their strengths have ruptured and the size of the event is determined from its area and stress drop. Long-range stress interactions are computed following the termination of simulated ruptures using a boundary element code. In practice, these stress perturbations are only computed for events above a certain size (e.g. a threshold length of 10 km) and stresses are updated on nearby structures. Events which occur as a result of these stress interactions are considered to be "triggered" earthquakes and they, in turn, can trigger further seismic activity. The threshold length for computing interaction stresses is a free parameter and hence interaction can be "turned off" by setting this to an unrealistically high value. We consider 3 synthetic fault networks of increasing degrees of complexity - modelled on the North Anatolian fault system, the structures in the San Francisco Bay Area, and the Southern California fault network. We find that the effect of interaction is dramatically different in networks of differing complexity. In the North Anatolian analogue, for example, interaction leads to a decreased number of events, increased b-values, and an increase in recurrence intervals. In the Bay Area model, by contrast, we observe that interaction increases the number of events, decreases the b-values, and has little effect on recurrence intervals. For all networks, we find that interaction can activate mis
Yu, Seong Jae; Keenan, Susan M; Tong, Weida; Welsh, William J
2002-10-01
Federal legislation has resulted in the two-tiered in vitro and in vivo screening of some 80 000 structurally diverse chemicals for possible endocrine disrupting effects. To maximize efficiency and minimize expense, prioritization of these chemicals with respect to their estrogenic disrupting potential prior to this time-consuming and labor-intensive screening process is essential. Computer-based quantitative structure-activity relationship (QSAR) models, such as those obtained using comparative molecular field analysis (CoMFA), have been demonstrated as useful for risk assessment in this application. In general, however, CoMFA models to predict estrogenicity have been developed from data sets with limited structural diversity. In this study, we constructed CoMFA models based on biological data for a structurally diverse set of compounds spanning eight chemical families. We also compared two standard alignment schemes employed in CoMFA, namely, atom-fit and flexible field-fit, with respect to the predictive capabilities of their respective models for structurally diverse data sets. The present analysis indicates that flexible field-fit alignment fares better than atom-fit alignment as the structural diversity of the data set increases. Values of log(RP), where RP = relative potency, predicted by the final flexible field-fit CoMFA models are in good agreement with the corresponding experimental values. These models should be effective for predicting the endocrine disrupting potential of existing chemicals as well as prospective and newly prepared chemicals before they enter the environment.
Pathway Model and Nonextensive Statistical Mechanics
Mathai, A. M.; Haubold, H. J.; Tsallis, C.
2015-12-01
The established technique of eliminating upper or lower parameters in a general hypergeometric series is profitably exploited to create pathways among confluent hypergeometric functions, binomial functions, Bessel functions, and exponential series. One such pathway, from the mathematical statistics point of view, results in distributions which naturally emerge within nonextensive statistical mechanics and Beck-Cohen superstatistics, as pursued in generalizations of Boltzmann-Gibbs statistics.
Statistical Ensemble Theory of Gompertz Growth Model
Directory of Open Access Journals (Sweden)
Takuya Yamano
2009-11-01
Full Text Available An ensemble formulation for the Gompertz growth function within the framework of statistical mechanics is presented, where the two growth parameters are assumed to be statistically distributed. The growth can be viewed as a self-referential process, which enables us to use the Bose-Einstein statistics picture. The analytical entropy expression pertain to the law can be obtained in terms of the growth velocity distribution as well as the Gompertz function itself for the whole process.
Modelling, controlling, predicting blackouts
Wang, Chengwei; Baptista, Murilo S
2016-01-01
The electric power system is one of the cornerstones of modern society. One of its most serious malfunctions is the blackout, a catastrophic event that may disrupt a substantial portion of the system, playing havoc to human life and causing great economic losses. Thus, understanding the mechanisms leading to blackouts and creating a reliable and resilient power grid has been a major issue, attracting the attention of scientists, engineers and stakeholders. In this paper, we study the blackout problem in power grids by considering a practical phase-oscillator model. This model allows one to simultaneously consider different types of power sources (e.g., traditional AC power plants and renewable power sources connected by DC/AC inverters) and different types of loads (e.g., consumers connected to distribution networks and consumers directly connected to power plants). We propose two new control strategies based on our model, one for traditional power grids, and another one for smart grids. The control strategie...
Abut, Fatih; Akay, Mehmet Fatih
2015-01-01
Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.
Energy Technology Data Exchange (ETDEWEB)
Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com
2009-11-15
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Predicting Student Performance: A Statistical and Data Mining Approach
National Research Council Canada - National Science Library
V Ramesh; P Parkavi; K Ramar
2013-01-01
.... The obtained results from hypothesis testing reveals that type of school is not influence student performance and parents' occupation plays a major role in predicting grades. This work will help the educational institutions to identify the students who are at risk and to and provide better additional training for the weak students.
Exploring Explanations of Subglacial Bedform Sizes Using Statistical Models.
Directory of Open Access Journals (Sweden)
John K Hillier
Full Text Available Sediments beneath modern ice sheets exert a key control on their flow, but are largely inaccessible except through geophysics or boreholes. In contrast, palaeo-ice sheet beds are accessible, and typically characterised by numerous bedforms. However, the interaction between bedforms and ice flow is poorly constrained and it is not clear how bedform sizes might reflect ice flow conditions. To better understand this link we present a first exploration of a variety of statistical models to explain the size distribution of some common subglacial bedforms (i.e., drumlins, ribbed moraine, MSGL. By considering a range of models, constructed to reflect key aspects of the physical processes, it is possible to infer that the size distributions are most effectively explained when the dynamics of ice-water-sediment interaction associated with bedform growth is fundamentally random. A 'stochastic instability' (SI model, which integrates random bedform growth and shrinking through time with exponential growth, is preferred and is consistent with other observations of palaeo-bedforms and geophysical surveys of active ice sheets. Furthermore, we give a proof-of-concept demonstration that our statistical approach can bridge the gap between geomorphological observations and physical models, directly linking measurable size-frequency parameters to properties of ice sheet flow (e.g., ice velocity. Moreover, statistically developing existing models as proposed allows quantitative predictions to be made about sizes, making the models testable; a first illustration of this is given for a hypothesised repeat geophysical survey of bedforms under active ice. Thus, we further demonstrate the potential of size-frequency distributions of subglacial bedforms to assist the elucidation of subglacial processes and better constrain ice sheet models.
Should I Pack My Umbrella? Clinical versus Statistical Prediction of Mental Health Decisions
Aegisdottir, Stefania; Spengler, Paul M.; White, Michael J.
2006-01-01
In this rejoinder, the authors respond to the insightful commentary of Strohmer and Arm, Chwalisz, and Hilton, Harris, and Rice about the meta-analysis on statistical versus clinical prediction techniques for mental health judgments. The authors address issues including the availability of statistical prediction techniques for real-life psychology…
Should I Pack My Umbrella? Clinical versus Statistical Prediction of Mental Health Decisions
Aegisdottir, Stefania; Spengler, Paul M.; White, Michael J.
2006-01-01
In this rejoinder, the authors respond to the insightful commentary of Strohmer and Arm, Chwalisz, and Hilton, Harris, and Rice about the meta-analysis on statistical versus clinical prediction techniques for mental health judgments. The authors address issues including the availability of statistical prediction techniques for real-life psychology…
STATISTICAL ANALYSIS OF THE TM- MODEL VIA BAYESIAN APPROACH
Directory of Open Access Journals (Sweden)
Muhammad Aslam
2012-11-01
Full Text Available The method of paired comparisons calls for the comparison of treatments presented in pairs to judges who prefer the better one based on their sensory evaluations. Thurstone (1927 and Mosteller (1951 employ the method of maximum likelihood to estimate the parameters of the Thurstone-Mosteller model for the paired comparisons. A Bayesian analysis of the said model using the non-informative reference (Jeffreys prior is presented in this study. The posterior estimates (means and joint modes of the parameters and the posterior probabilities comparing the two parameters are obtained for the analysis. The predictive probabilities that one treatment (Ti in preferred to any other treatment (Tj in a future single comparison are also computed. In addition, the graphs of the marginal posterior distributions of the individual parameter are drawn. The appropriateness of the model is also tested using the Chi-Square test statistic.
Modeling, dependence, classification, united statistical science, many cultures
Parzen, Emanuel
2012-01-01
Breiman (2001) proposed to statisticians awareness of two cultures: 1. Parametric modeling culture, pioneered by R.A.Fisher and Jerzy Neyman; 2. Algorithmic predictive culture, pioneered by machine learning research. Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of many cultures, including the focus of our research: 3. Nonparametric, quantile based, information theoretic modeling. Our research seeks to unify statistical problem solving in terms of comparison density, copula density, measure of dependence, correlation, information, new measures (called LP score comoments) that apply to long tailed distributions with out finite second order moments. A very important goal is to unify methods for discrete and continuous random variables. We are actively developing these ideas, which have a history of many decades, since Parzen (1979, 1983) and Eubank et al. (1987). Our research extends these methods to modern high dimensional data modeling.
Bassis, J. N.
2009-12-01
One of the discoveries in glaciology over the past decade with far reaching consequences is the realization that iceberg calving provides an efficient mechanism to transfer large amounts of ice to the ocean in a near instantaneous fashion. Most attempts at formulating models of the fracture process that precedes iceberg calving have focused on developing criteria that predict when an isolated crevasse can penetrate the entire ice thickness. In this presentation we argue that because the distribution of pre-existing flaws within the ice is something which is largely unknown, the statistical nature of fracture must be considered. We propose a statistical model of iceberg calving based on a combination of extreme value statistics - an approach increasingly common in modeling fracture of quasi-brittle materials - and statistical-thermodynamics - an approach which allows macroscopic, large-scale state variables to influence statistical dynamics of individual flaws. In the theory that we present, the probability of fracture and hence of iceberg calving is a function of the applied stress, ice thickness and fracture density. The size-distribution of fractures is determined using a statistical-thermodynamic approach specifically developed for fracture of disordered media. We compare predictions of our model against observed advance and retreat rates of Greenland outlet glaciers as well as Antarctic ice shelves. A key prediction of our model is the existence of fluctuations in the position of ice fronts, stable or otherwise. These fluctuations can occasionally bump an otherwise healthy tidewater glacier into a phase of irreversible, rapid retreat.
Foundations of Complex Systems Nonlinear Dynamics, Statistical Physics, and Prediction
Nicolis, Gregoire
2007-01-01
Complexity is emerging as a post-Newtonian paradigm for approaching a large body of phenomena of concern at the crossroads of physical, engineering, environmental, life and human sciences from a unifying point of view. This book outlines the foundations of modern complexity research as it arose from the cross-fertilization of ideas and tools from nonlinear science, statistical physics and numerical simulation. It is shown how these developments lead to an understanding, both qualitative and quantitative, of the complex systems encountered in nature and in everyday experience and, conversely, h
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Energy Technology Data Exchange (ETDEWEB)
Johannesson, G.; Stewart, J.; Barr, C.; Brady Sabeff, L.; George, R.; Heimiller, D.; Milbrandt, A.
2006-01-01
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the abovementioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Energy Technology Data Exchange (ETDEWEB)
Lawrence Livermore National Laboratory
2006-01-27
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy-related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the above-mentioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH
Directory of Open Access Journals (Sweden)
R. SATISHKUMAR
2017-01-01
Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.
Multiple-point statistical prediction on fracture networks at Yucca Mountain
Liu, Xiaoyan; Zhang, Chengyuan; Liu, Quansheng; Birkholzer, Jens
2009-05-01
In many underground nuclear waste repository systems, such as Yucca Mountain project, water flow rate and amount of water seepage into the waste emplacement drifts are mainly determined by hydrological properties of fracture network in the surrounding rock mass. Natural fracture network system is not easy to describe, especially with respect to its connectivity which is critically important for simulating the water flow field. In this paper, we introduced a new method for fracture network description and prediction, termed multi-point-statistics (MPS). The process of Multi-point Statistical method is to record multiple-point statistics concerning the connectivity patterns of fracture network from a known fracture map, and to reproduce multiple-scale training fracture patterns in a stochastic manner, implicitly and directly. It is applied to fracture data to study flow field behavior at Yucca Mountain waste repository system. First, MPS method is used to create fracture network with original fracture training image from Yucca Mountain dataset. After we adopt a harmonic and arithmetic average method to upscale the permeability to a coarse grid, THM simulation is carried out to study near-field water flow in surrounding rock of waste emplacement drifts. Our study shows that connectivity or pattern of fracture network can be grasped and reconstructed by Multi-Point-Statistical method. In theory, it will lead to better prediction of fracture system characteristics and flow behavior. Meanwhile, we can obtain variance from flow field, which gives us a way to quantify uncertainty of models even in complicated coupled THM simulation. It indicates that Multi-Point Statistics is a potential method to characterize and reconstruct natural fracture network in a fractured rock mass with advantages of quantifying connectivity of fracture system and its simulation uncertainty simultaneously.
Arockia Bazil Raj, A.; Padmavathi, S.
2016-07-01
Atmospheric parameters strongly affect the performance of Free Space Optical Communication (FSOC) system when the optical wave is propagating through the inhomogeneous turbulent medium. Developing a model to get an accurate prediction of optical attenuation according to meteorological parameters becomes significant to understand the behaviour of FSOC channel during different seasons. A dedicated free space optical link experimental set-up is developed for the range of 0.5 km at an altitude of 15.25 m. The diurnal profile of received power and corresponding meteorological parameters are continuously measured using the developed optoelectronic assembly and weather station, respectively, and stored in a data logging computer. Measured meteorological parameters (as input factors) and optical attenuation (as response factor) of size [177147 × 4] are used for linear regression analysis and to design the mathematical model that is more suitable to predict the atmospheric optical attenuation at our test field. A model that exhibits the R2 value of 98.76% and average percentage deviation of 1.59% is considered for practical implementation. The prediction accuracy of the proposed model is investigated along with the comparative results obtained from some of the existing models in terms of Root Mean Square Error (RMSE) during different local seasons in one-year period. The average RMSE value of 0.043-dB/km is obtained in the longer range dynamic of meteorological parameters variations.
Statistical Model Checking of Rich Models and Properties
DEFF Research Database (Denmark)
Poulsen, Danny Bøgsted
in undecidability issues for the traditional model checking approaches. Statistical model checking has proven itself a valuable supplement to model checking and this thesis is concerned with extending this software validation technique to stochastic hybrid systems. The thesis consists of two parts: the first part......Software is in increasing fashion embedded within safety- and business critical processes of society. Errors in these embedded systems can lead to human casualties or severe monetary loss. Model checking technology has proven formal methods capable of finding and correcting errors in software....... However, software is approaching the boundary in terms of the complexity and size that model checking can handle. Furthermore, software systems are nowadays more frequently interacting with their environment hence accurately modelling such systems requires modelling the environment as well - resulting...
A Statistical Quality Model for Data-Driven Speech Animation.
Ma, Xiaohan; Deng, Zhigang
2012-11-01
In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.
Liver Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing liver cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Colorectal Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Cervical Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Prostate Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing prostate cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Pancreatic Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing pancreatic cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Colorectal Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Bladder Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing bladder cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Esophageal Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing esophageal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Lung Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing lung cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Breast Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing breast cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Ovarian Cancer Risk Prediction Models
Developing statistical models that estimate the probability of developing ovarian cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
Testicular Cancer Risk Prediction Models
Developing statistical models that estimate the probability of testicular cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
A statistical permafrost distribution model for the European Alps
Directory of Open Access Journals (Sweden)
L. Boeckli
2011-05-01
Full Text Available Permafrost distribution modeling in densely populated mountain regions is an important task to support the construction of infrastructure and for the assessment of climate change effects on permafrost and related natural systems. In order to analyze permafrost distribution and evolution on an Alpine-wide scale, one consistent model for the entire domain is needed.
We present a statistical permafrost model for the entire Alps based on rock glacier inventories and rock surface temperatures. Starting from an integrated model framework, two different sub-models were developed, one for debris covered areas (debris model and one for steep rock faces (rock model. For the debris model a generalized linear mixed-effect model (GLMM was used to predict the probability of a rock glacier being intact as opposed to relict. The model is based on the explanatory variables mean annual air temperature (MAAT, potential incoming solar radiation (PISR and the mean annual sum of precipitation (PRECIP, and achieves an excellent discrimination (area under the receiver-operating characteristic, AUROC = 0.91. Surprisingly, the probability of a rock glacier being intact is positively associated with increasing PRECIP for given MAAT and PISR conditions. The rock model was calibrated with mean annual rock surface temperatures (MARST and is based on MAAT and PISR. The linear regression achieves a root mean square error (RMSE of 1.6 °C. The final model combines the two sub-models and accounts for the different scales used for model calibration. Further steps to transfer this model into a map-based product are outlined.
Directory of Open Access Journals (Sweden)
Amany E. Aly
2016-04-01
Full Text Available When a system consisting of independent components of the same type, some appropriate actions may be done as soon as a portion of them have failed. It is, therefore, important to be able to predict later failure times from earlier ones. One of the well-known failure distributions commonly used to model component life, is the modified Weibull distribution (MWD. In this paper, two pivotal quantities are proposed to construct prediction intervals for future unobservable lifetimes based on generalized order statistics (gos from MWD. Moreover, a pivotal quantity is developed to reconstruct missing observations at the beginning of experiment. Furthermore, Monte Carlo simulation studies are conducted and numerical computations are carried out to investigate the efficiency of presented results. Finally, two illustrative examples for real data sets are analyzed.
Integer Set Compression and Statistical Modeling
DEFF Research Database (Denmark)
Larsson, N. Jesper
2014-01-01
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher compression performance. In this work, we address the case where...... enumeration of elements may be arbitrary or random, but where statistics is kept in order to estimate probabilities of elements. We present a recursive subset-size encoding method that is able to benefit from statistics, explore the effects of permuting the enumeration order based on element probabilities...
Statistical Modeling of Robotic Random Walks on Different Terrain
Naylor, Austin; Kinnaman, Laura
Issues of public safety, especially with crowd dynamics and pedestrian movement, have been modeled by physicists using methods from statistical mechanics over the last few years. Complex decision making of humans moving on different terrains can be modeled using random walks (RW) and correlated random walks (CRW). The effect of different terrains, such as a constant increasing slope, on RW and CRW was explored. LEGO robots were programmed to make RW and CRW with uniform step sizes. Level ground tests demonstrated that the robots had the expected step size distribution and correlation angles (for CRW). The mean square displacement was calculated for each RW and CRW on different terrains and matched expected trends. The step size distribution was determined to change based on the terrain; theoretical predictions for the step size distribution were made for various simple terrains. It's Dr. Laura Kinnaman, not sure where to put the Prefix.
Statistical models and methods for reliability and survival analysis
Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo
2013-01-01
Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Glass viscosity calculation based on a global statistical modelling approach
Energy Technology Data Exchange (ETDEWEB)
Fluegel, Alex
2007-02-01
A global statistical glass viscosity model was developed for predicting the complete viscosity curve, based on more than 2200 composition-property data of silicate glasses from the scientific literature, including soda-lime-silica container and float glasses, TV panel glasses, borosilicate fiber wool and E type glasses, low expansion borosilicate glasses, glasses for nuclear waste vitrification, lead crystal glasses, binary alkali silicates, and various further compositions from over half a century. It is shown that within a measurement series from a specific laboratory the reported viscosity values are often over-estimated at higher temperatures due to alkali and boron oxide evaporation during the measurement and glass preparation, including data by Lakatos et al. (1972) and the recently published High temperature glass melt property database for process modeling by Seward et al. (2005). Similarly, in the glass transition range many experimental data of borosilicate glasses are reported too high due to phase separation effects. The developed global model corrects those errors. The model standard error was 9-17°C, with R^2 = 0.985-0.989. The prediction 95% confidence interval for glass in mass production largely depends on the glass composition of interest, the composition uncertainty, and the viscosity level. New insights in the mixed-alkali effect are provided.
A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model
Energy Technology Data Exchange (ETDEWEB)
Pasqualini, Donatella [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-05-11
This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimated stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.
Masked areas in shear peak statistics. A forward modeling approach
Energy Technology Data Exchange (ETDEWEB)
Bard, D.; Kratochvil, J. M.; Dawson, W.
2016-03-09
The statistics of shear peaks have been shown to provide valuable cosmological information beyond the power spectrum, and will be an important constraint of models of cosmology in forthcoming astronomical surveys. Surveys include masked areas due to bright stars, bad pixels etc., which must be accounted for in producing constraints on cosmology from shear maps. We advocate a forward-modeling approach, where the impacts of masking and other survey artifacts are accounted for in the theoretical prediction of cosmological parameters, rather than correcting survey data to remove them. We use masks based on the Deep Lens Survey, and explore the impact of up to 37% of the survey area being masked on LSST and DES-scale surveys. By reconstructing maps of aperture mass the masking effect is smoothed out, resulting in up to 14% smaller statistical uncertainties compared to simply reducing the survey area by the masked area. We show that, even in the presence of large survey masks, the bias in cosmological parameter estimation produced in the forward-modeling process is ≈1%, dominated by bias caused by limited simulation volume. We also explore how this potential bias scales with survey area and evaluate how much small survey areas are impacted by the differences in cosmological structure in the data and simulated volumes, due to cosmic variance.
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E; Bonhoeffer, Sebastian
2016-09-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations.
Predictive models of forest dynamics.
Purves, Drew; Pacala, Stephen
2008-06-13
Dynamic global vegetation models (DGVMs) have shown that forest dynamics could dramatically alter the response of the global climate system to increased atmospheric carbon dioxide over the next century. But there is little agreement between different DGVMs, making forest dynamics one of the greatest sources of uncertainty in predicting future climate. DGVM predictions could be strengthened by integrating the ecological realities of biodiversity and height-structured competition for light, facilitated by recent advances in the mathematics of forest modeling, ecological understanding of diverse forest communities, and the availability of forest inventory data.
Statistic Approach versus Artificial Intelligence for Rainfall Prediction Based on Data Series
Directory of Open Access Journals (Sweden)
Indrabayu
2013-04-01
Full Text Available This paper proposed a new idea in comparing two common predictors i.e. the statistic method and artificial intelligence (AI for rainfall prediction using empirical data series. The statistic method uses Auto- Regressive Integrated Moving (ARIMA and Adaptive Splines Threshold Autoregressive (ASTAR, most favorable statistic tools, while in the AI, combination of Genetic Algorithm-Neural Network (GA-NN is chosen. The results show that ASTAR gives best prediction compare to others, in term of root mean square (RMSE and following trend between prediction and actual.
Statistical Compressive Sensing of Gaussian Mixture Models
Yu, Guoshen
2010-01-01
A new framework of compressive sensing (CS), namely statistical compressive sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution and achieving accurate reconstruction on average, is introduced. For signals following a Gaussian distribution, with Gaussian or Bernoulli sensing matrices of O(k) measurements, considerably smaller than the O(k log(N/k)) required by conventional CS, where N is the signal dimension, and with an optimal decoder implemented with linear filtering, significantly faster than the pursuit decoders applied in conventional CS, the error of SCS is shown tightly upper bounded by a constant times the k-best term approximation error, with overwhelming probability. The failure probability is also significantly smaller than that of conventional CS. Stronger yet simpler results further show that for any sensing matrix, the error of Gaussian SCS is upper bounded by a constant times the k-best term approximation with probability one, and the ...
Enhanced surrogate models for statistical design exploiting space mapping technology
DEFF Research Database (Denmark)
Koziel, Slawek; Bandler, John W.; Mohamed, Achmed S.;
2005-01-01
We present advances in microwave and RF device modeling exploiting Space Mapping (SM) technology. We propose new SM modeling formulations utilizing input mappings, output mappings, frequency scaling and quadratic approximations. Our aim is to enhance circuit models for statistical analysis...
Borsboom, D.; Haig, B.D.
2013-01-01
Unlike most other statistical frameworks, Bayesian statistical inference is wedded to a particular approach in the philosophy of science (see Howson & Urbach, 2006); this approach is called Bayesianism. Rather than being concerned with model fitting, this position in the philosophy of science primar
STATISTICAL MECHANICS MODELING OF MESOSCALE DEFORMATION IN METALS
Energy Technology Data Exchange (ETDEWEB)
Anter El-Azab
2013-04-08
The research under this project focused on a theoretical and computational modeling of dislocation dynamics of mesoscale deformation of metal single crystals. Specifically, the work aimed to implement a continuum statistical theory of dislocations to understand strain hardening and cell structure formation under monotonic loading. These aspects of crystal deformation are manifestations of the evolution of the underlying dislocation system under mechanical loading. The project had three research tasks: 1) Investigating the statistical characteristics of dislocation systems in deformed crystals. 2) Formulating kinetic equations of dislocations and coupling these kinetics equations and crystal mechanics. 3) Computational solution of coupled crystal mechanics and dislocation kinetics. Comparison of dislocation dynamics predictions with experimental results in the area of statistical properties of dislocations and their field was also a part of the proposed effort. In the first research task, the dislocation dynamics simulation method was used to investigate the spatial, orientation, velocity, and temporal statistics of dynamical dislocation systems, and on the use of the results from this investigation to complete the kinetic description of dislocations. The second task focused on completing the formulation of a kinetic theory of dislocations that respects the discrete nature of crystallographic slip and the physics of dislocation motion and dislocation interaction in the crystal. Part of this effort also targeted the theoretical basis for establishing the connection between discrete and continuum representation of dislocations and the analysis of discrete dislocation simulation results within the continuum framework. This part of the research enables the enrichment of the kinetic description with information representing the discrete dislocation systems behavior. The third task focused on the development of physics-inspired numerical methods of solution of the coupled
Mickley, Loretta J.
2017-01-01
We develop a statistical model to predict June–July–August (JJA) daily maximum 8-h average (MDA8) ozone concentrations in the eastern United States based on large-scale climate patterns during the previous spring. We find that anomalously high JJA ozone in the East is correlated with these springtime patterns: warm tropical Atlantic and cold northeast Pacific sea surface temperatures (SSTs), as well as positive sea level pressure (SLP) anomalies over Hawaii and negative SLP anomalies over the Atlantic and North America. We then develop a linear regression model to predict JJA MDA8 ozone from 1980 to 2013, using the identified SST and SLP patterns from the previous spring. The model explains ∼45% of the variability in JJA MDA8 ozone concentrations and ∼30% variability in the number of JJA ozone episodes (>70 ppbv) when averaged over the eastern United States. This seasonal predictability results from large-scale ocean–atmosphere interactions. Warm tropical Atlantic SSTs can trigger diabatic heating in the atmosphere and influence the extratropical climate through stationary wave propagation, leading to greater subsidence, less precipitation, and higher temperatures in the East, which increases surface ozone concentrations there. Cooler SSTs in the northeast Pacific are also associated with more summertime heatwaves and high ozone in the East. On average, models participating in the Atmospheric Model Intercomparison Project fail to capture the influence of this ocean–atmosphere interaction on temperatures in the eastern United States, implying that such models would have difficulty simulating the interannual variability of surface ozone in this region. PMID:28223483
Statistical models of shape optimisation and evaluation
Davies, Rhodri; Taylor, Chris
2014-01-01
Deformable shape models have wide application in computer vision and biomedical image analysis. This book addresses a key issue in shape modelling: establishment of a meaningful correspondence between a set of shapes. Full implementation details are provided.
Steady state statistical correlations predict bistability in reaction motifs.
Chakravarty, Suchana; Barik, Debashis
2017-03-01
Various cellular decision making processes are regulated by bistable switches that take graded input signals and convert them to binary all-or-none responses. Traditionally, a bistable switch generated by a positive feedback loop is characterized either by a hysteretic signal response curve with two distinct signaling thresholds or by characterizing the bimodality of the response distribution in the bistable region. To identify the intrinsic bistability of a feedback regulated network, here we propose that bistability can be determined by correlating higher order moments and cumulants (≥2) of the joint steady state distributions of two components connected in a positive feedback loop. We performed stochastic simulations of four feedback regulated models with intrinsic bistability and we show that for a bistable switch with variation of the signal dose, the steady state variance vs. covariance adopts a signatory cusp-shaped curve. Further, we find that the (n + 1)th order cross-cumulant vs. nth order cross-cumulant adopts a closed loop structure for at least n = 3. We also propose that our method is capable of identifying systems without intrinsic bistability even though the system may show bimodality in the marginal response distribution. The proposed method can be used to analyze single cell protein data measured at steady state from experiments such as flow cytometry.
Analysis and Evaluation of Statistical Models for Integrated Circuits Design
Directory of Open Access Journals (Sweden)
Sáenz-Noval J.J.
2011-10-01
Full Text Available Statistical models for integrated circuits (IC allow us to estimate the percentage of acceptable devices in the batch before fabrication. Actually, Pelgrom is the statistical model most accepted in the industry; however it was derived from a micrometer technology, which does not guarantee reliability in nanometric manufacturing processes. This work considers three of the most relevant statistical models in the industry and evaluates their limitations and advantages in analog design, so that the designer has a better criterion to make a choice. Moreover, it shows how several statistical models can be used for each one of the stages and design purposes.
Statistics-based investigation on typhoon transition modeling
DEFF Research Database (Denmark)
Zhang, Shuoyun; Nishijima, Kazuyoshi
and the seasonality are taken into account by developing the models for different spatial grids and seasons separately. An appropriate size of spatial grids is investigated. The statistical characteristics of the random residual terms in the models are also examined. Finally, Monte Carlo simulations are performed......The present study revisits the statistical modeling of typhoon transition. The objective of the study is to provide insights on plausible statistical typhoon transition models based on extensive statistical analysis. First, the correlation structures of the typhoon transition are estimated in terms...
Statistical image processing and multidimensional modeling
Fieguth, Paul
2010-01-01
Images are all around us! The proliferation of low-cost, high-quality imaging devices has led to an explosion in acquired images. When these images are acquired from a microscope, telescope, satellite, or medical imaging device, there is a statistical image processing task: the inference of something - an artery, a road, a DNA marker, an oil spill - from imagery, possibly noisy, blurry, or incomplete. A great many textbooks have been written on image processing. However this book does not so much focus on images, per se, but rather on spatial data sets, with one or more measurements taken over
Statistical Tests for Mixed Linear Models
Khuri, André I; Sinha, Bimal K
2011-01-01
An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a
Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages
Jadoul, Yannick; Ravignani, Andrea; Thompson, Bill; Filippi, Piera; de Boer, Bart
2016-01-01
Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure—regularities arising in an ordered series of syllable timings—testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint. PMID:27994544
Yang, Jing; Zammit, Christian; Dudley, Bruce
2017-04-01
The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Statistical models for nuclear decay from evaporation to vaporization
Cole, A J
2000-01-01
Elements of equilibrium statistical mechanics: Introduction. Microstates and macrostates. Sub-systems and convolution. The Boltzmann distribution. Statistical mechanics and thermodynamics. The grand canonical ensemble. Equations of state for ideal and real gases. Pseudo-equilibrium. Statistical models of nuclear decay. Nuclear physics background: Introduction. Elements of the theory of nuclear reactions. Quantum mechanical description of scattering from a potential. Decay rates and widths. Level and state densities in atomic nuclei. Angular momentum in quantum mechanics. History of statistical
Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.
2016-01-01
Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.
Evaluation of Fast-Time Wake Vortex Prediction Models
Proctor, Fred H.; Hamilton, David W.
2009-01-01
Current fast-time wake models are reviewed and three basic types are defined. Predictions from several of the fast-time models are compared. Previous statistical evaluations of the APA-Sarpkaya and D2P fast-time models are discussed. Root Mean Square errors between fast-time model predictions and Lidar wake measurements are examined for a 24 hr period at Denver International Airport. Shortcomings in current methodology for evaluating wake errors are also discussed.
Physics-based statistical model and simulation method of RF propagation in urban environments
Pao, Hsueh-Yuan; Dvorak, Steven L.
2010-09-14
A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
12th Workshop on Stochastic Models, Statistics and Their Applications
Rafajłowicz, Ewaryst; Szajowski, Krzysztof
2015-01-01
This volume presents the latest advances and trends in stochastic models and related statistical procedures. Selected peer-reviewed contributions focus on statistical inference, quality control, change-point analysis and detection, empirical processes, time series analysis, survival analysis and reliability, statistics for stochastic processes, big data in technology and the sciences, statistical genetics, experiment design, and stochastic models in engineering. Stochastic models and related statistical procedures play an important part in furthering our understanding of the challenging problems currently arising in areas of application such as the natural sciences, information technology, engineering, image analysis, genetics, energy and finance, to name but a few. This collection arises from the 12th Workshop on Stochastic Models, Statistics and Their Applications, Wroclaw, Poland.
Directory of Open Access Journals (Sweden)
Mark V Albert
2012-12-01
Full Text Available Due to multiple factors such as fatigue, muscle strengthening, and neural plasticity, the responsiveness of the motor apparatus to neural commands changes over time. To enable precise movements the nervous system must adapt to compensate for these changes. Recent models of motor adaptation derive from assumptions about the way the motor apparatus changes. Characterizing these changes is difficult because motor adaptation happens at the same time, masking most of the effects of ongoing changes. Here, we analyze eye movements of monkeys with lesions to the posterior cerebellar vermis that impair adaptation. Their fluctuations better reveal the underlying changes of the motor system over time. When these measured, unadapted changes are used to derive optimal motor adaptation rules the prediction precision significantly improves. Among three models that similarly fit single-day adaptation results, the model that also matches the temporal correlations of the nonadapting saccades most accurately predicts multiple day adaptation. Saccadic gain adaptation is well matched to the natural statistics of fluctuations of the oculomotor plant.
Functional summary statistics for the Johnson-Mehl model
DEFF Research Database (Denmark)
Møller, Jesper; Ghorbani, Mohammad
of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....
Statistical modeling and recognition of surgical workflow.
Padoy, Nicolas; Blum, Tobias; Ahmadi, Seyed-Ahmad; Feussner, Hubertus; Berger, Marie-Odile; Navab, Nassir
2012-04-01
In this paper, we contribute to the development of context-aware operating rooms by introducing a novel approach to modeling and monitoring the workflow of surgical interventions. We first propose a new representation of interventions in terms of multidimensional time-series formed by synchronized signals acquired over time. We then introduce methods based on Dynamic Time Warping and Hidden Markov Models to analyze and process this data. This results in workflow models combining low-level signals with high-level information such as predefined phases, which can be used to detect actions and trigger an event. Two methods are presented to train these models, using either fully or partially labeled training surgeries. Results are given based on tool usage recordings from sixteen laparoscopic cholecystectomies performed by several surgeons.
On the Logical Development of Statistical Models.
1983-12-01
parameters t2 . Type I models include scalar and vectorial probability distributions. Usually, the noise has an expected value equal to zero, so that...qualitative variables. As might be expected, the vectorial representation of all these types of models lagged behind the scaler forms. The first...1978). "Modelos con parametros variables en el analisis de series temporales" Questiio, 4, 2, 75-87. [25] Seal, H. L. (1967). "The historical
Book review: Statistical Analysis and Modelling of Spatial Point Patterns
DEFF Research Database (Denmark)
Møller, Jesper
2009-01-01
Statistical Analysis and Modelling of Spatial Point Patterns by J. Illian, A. Penttinen, H. Stoyan and D. Stoyan. Wiley (2008), ISBN 9780470014912......Statistical Analysis and Modelling of Spatial Point Patterns by J. Illian, A. Penttinen, H. Stoyan and D. Stoyan. Wiley (2008), ISBN 9780470014912...
A hybrid random field model for scalable statistical learning.
Freno, A; Trentin, E; Gori, M
2009-01-01
This paper introduces hybrid random fields, which are a class of probabilistic graphical models aimed at allowing for efficient structure learning in high-dimensional domains. Hybrid random fields, along with the learning algorithm we develop for them, are especially useful as a pseudo-likelihood estimation technique (rather than a technique for estimating strict joint probability distributions). In order to assess the generality of the proposed model, we prove that the class of pseudo-likelihood distributions representable by hybrid random fields strictly includes the class of joint probability distributions representable by Bayesian networks. Once we establish this result, we develop a scalable algorithm for learning the structure of hybrid random fields, which we call 'Markov Blanket Merging'. On the one hand, we characterize some complexity properties of Markov Blanket Merging both from a theoretical and from the experimental point of view, using a series of synthetic benchmarks. On the other hand, we evaluate the accuracy of hybrid random fields (as learned via Markov Blanket Merging) by comparing them to various alternative statistical models in a number of pattern classification and link-prediction applications. As the results show, learning hybrid random fields by the Markov Blanket Merging algorithm not only reduces significantly the computational cost of structure learning with respect to several considered alternatives, but it also leads to models that are highly accurate as compared to the alternative ones.
Caries risk assessment models in caries prediction
Directory of Open Access Journals (Sweden)
Amila Zukanović
2013-11-01
Full Text Available Objective. The aim of this research was to assess the efficiency of different multifactor models in caries prediction. Material and methods. Data from the questionnaire and objective examination of 109 examinees was entered into the Cariogram, Previser and Caries-Risk Assessment Tool (CAT multifactor risk assessment models. Caries risk was assessed with the help of all three models for each patient, classifying them as low, medium or high-risk patients. The development of new caries lesions over a period of three years [Decay Missing Filled Tooth (DMFT increment = difference between Decay Missing Filled Tooth Surface (DMFTS index at baseline and follow up], provided for examination of the predictive capacity concerning different multifactor models. Results. The data gathered showed that different multifactor risk assessment models give significantly different results (Friedman test: Chi square = 100.073, p=0.000. Cariogram is the model which identified the majority of examinees as medium risk patients (70%. The other two models were more radical in risk assessment, giving more unfavorable risk –profiles for patients. In only 12% of the patients did the three multifactor models assess the risk in the same way. Previser and CAT gave the same results in 63% of cases – the Wilcoxon test showed that there is no statistically significant difference in caries risk assessment between these two models (Z = -1.805, p=0.071. Conclusions. Evaluation of three different multifactor caries risk assessment models (Cariogram, PreViser and CAT showed that only the Cariogram can successfully predict new caries development in 12-year-old Bosnian children.
Improving statistical forecasts of seasonal streamflows using hydrological model output
Directory of Open Access Journals (Sweden)
D. E. Robertson
2013-02-01
Full Text Available Statistical methods traditionally applied for seasonal streamflow forecasting use predictors that represent the initial catchment condition and future climate influences on future streamflows. Observations of antecedent streamflows or rainfall commonly used to represent the initial catchment conditions are surrogates for the true source of predictability and can potentially have limitations. This study investigates a hybrid seasonal forecasting system that uses the simulations from a dynamic hydrological model as a predictor to represent the initial catchment condition in a statistical seasonal forecasting method. We compare the skill and reliability of forecasts made using the hybrid forecasting approach to those made using the existing operational practice of the Australian Bureau of Meteorology for 21 catchments in eastern Australia. We investigate the reasons for differences. In general, the hybrid forecasting system produces forecasts that are more skilful than the existing operational practice and as reliable. The greatest increases in forecast skill tend to be (1 when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall, (2 when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow, and (3 when the initial catchment condition is near saturation intermittently throughout the historical record.
Improving statistical forecasts of seasonal streamflows using hydrological model output
Robertson, D. E.; Pokhrel, P.; Wang, Q. J.
2013-02-01
Statistical methods traditionally applied for seasonal streamflow forecasting use predictors that represent the initial catchment condition and future climate influences on future streamflows. Observations of antecedent streamflows or rainfall commonly used to represent the initial catchment conditions are surrogates for the true source of predictability and can potentially have limitations. This study investigates a hybrid seasonal forecasting system that uses the simulations from a dynamic hydrological model as a predictor to represent the initial catchment condition in a statistical seasonal forecasting method. We compare the skill and reliability of forecasts made using the hybrid forecasting approach to those made using the existing operational practice of the Australian Bureau of Meteorology for 21 catchments in eastern Australia. We investigate the reasons for differences. In general, the hybrid forecasting system produces forecasts that are more skilful than the existing operational practice and as reliable. The greatest increases in forecast skill tend to be (1) when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall, (2) when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow, and (3) when the initial catchment condition is near saturation intermittently throughout the historical record.
Fluctuations of offshore wind generation: Statistical modelling
DEFF Research Database (Denmark)
Pinson, Pierre; Christensen, Lasse E.A.; Madsen, Henrik
2007-01-01
The magnitude of power fluctuations at large offshore wind farms has a significant impact on the control and management strategies of their power output. If focusing on the minute scale, one observes successive periods with smaller and larger power fluctuations. It seems that different regimes...... production averaged at a 1, 5, and 10-minute rate. The exercise consists in one-step ahead forecasting of these time-series with the various regime-switching models. It is shown that the MSAR model, for which the succession of regimes is represented by a hidden Markov chain, significantly outperforms...
Statistical modelling of traffic safety development
DEFF Research Database (Denmark)
Christens, Peter
2004-01-01
: - Statistisk modellering af trafik uheld, Trafikdage på Ålborg Univeristet, 2001. - Sociale karakteristika hos trafikofre, Danish Transport Research Institute, 2001. - Models for traffic accidents, FERSI Young Researchers' Seminar, 2001. - Evaluation of the Danish Automatic Mobile Speed Camera Project...... 2000 dræbte trafikuheld over 40.000 i EU og skadede over 1.7 millioner. I Danmark i 2001 var der 6861 politirapporteret trafikuheld med tilskadekomst. De resulterede i 4519 lettere tilskadekomne, 3946 alvorligt tilskadekomne og 431 dræbte. Det generelle formål med dette forskningsarbejde er at forbedre...
Exponential order statistic models of software reliability growth
Miller, D. R.
1986-01-01
Failure times of a software reliability growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.
Statistical Modeling of Large-Scale Scientific Simulation Data
Energy Technology Data Exchange (ETDEWEB)
Eliassi-Rad, T; Baldwin, C; Abdulla, G; Critchlow, T
2003-11-15
With the advent of massively parallel computer systems, scientists are now able to simulate complex phenomena (e.g., explosions of a stars). Such scientific simulations typically generate large-scale data sets over the spatio-temporal space. Unfortunately, the sheer sizes of the generated data sets make efficient exploration of them impossible. Constructing queriable statistical models is an essential step in helping scientists glean new insight from their computer simulations. We define queriable statistical models to be descriptive statistics that (1) summarize and describe the data within a user-defined modeling error, and (2) are able to answer complex range-based queries over the spatiotemporal dimensions. In this chapter, we describe systems that build queriable statistical models for large-scale scientific simulation data sets. In particular, we present our Ad-hoc Queries for Simulation (AQSim) infrastructure, which reduces the data storage requirements and query access times by (1) creating and storing queriable statistical models of the data at multiple resolutions, and (2) evaluating queries on these models of the data instead of the entire data set. Within AQSim, we focus on three simple but effective statistical modeling techniques. AQSim's first modeling technique (called univariate mean modeler) computes the ''true'' (unbiased) mean of systematic partitions of the data. AQSim's second statistical modeling technique (called univariate goodness-of-fit modeler) uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. Finally, AQSim's third statistical modeling technique (called multivariate clusterer) utilizes the cosine similarity measure to cluster the data into similar groups. Our experimental evaluations on several scientific simulation data sets illustrate the value of using these statistical models on large-scale simulation data sets.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Advanced data analysis in neuroscience integrating statistical and computational models
Durstewitz, Daniel
2017-01-01
This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering. Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...
Eigenfunction statistics in the localized Anderson model
Killip, R
2006-01-01
We consider the localized region of the Anderson model and study the distribution of eigenfunctions simultaneously in space and energy. In a natural scaling limit, we prove convergence to a Poisson process. This provides a counterpoint to recent work, which proves repulsion of the localization centres in a subtly different regime.
Structured Statistical Models of Inductive Reasoning
Kemp, Charles; Tenenbaum, Joshua B.
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet…
Better predictions when models are wrong or underspecified
Ommen, Matthijs van
2015-01-01
Many statistical methods rely on models of reality in order to learn from data and to make predictions about future data. By necessity, these models usually do not match reality exactly, but are either wrong (none of the hypotheses in the model provides an accurate description of reality) or undersp
Quantitative plant ecology:statistical and ecological modelling of plant abundance
Damgaard, Christian
2014-01-01
This e-book is written in the Wolfram' CDF format (download free CDF player from Wolfram.com)The objective of this e-book is to introduce the population ecological concepts for measuring and predicting the ecological success of plant species. This will be done by focusing on the measurement and statistical modelling of plant species abundance and the relevant ecological processes that control species abundance. The focus on statistical modelling and likelihood function based methods also mean...
Network Data: Statistical Theory and New Models
2016-02-17
Using AERONET DRAGON Campaign Data, IEEE Transactions on Geoscience and Remote Sensing, (08 2015): 0. doi: 10.1109/TGRS.2015.2395722 Geoffrey...are not viable, i.e. the fruit fly dies after the knock-out of the gene. Further examination of the ftz stained embryos indicates that the lack of...our approach for spatial gene expression analysis for early stage fruit fly embryos, we are in a process to extend it to model later stage gene
Combining logistic regression and neural networks to create predictive models.
Spackman, K. A.
1992-01-01
Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...
Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.
2016-01-01
Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.
The statistical multifragmentation model: Origins and recent advances
Donangelo, R.; Souza, S. R.
2016-07-01
We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.
The statistical multifragmentation model: Origins and recent advances
Energy Technology Data Exchange (ETDEWEB)
Donangelo, R., E-mail: donangel@fing.edu.uy [Instituto de Física, Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, 11300, Montevideo (Uruguay); Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Souza, S. R., E-mail: srsouza@if.ufrj.br [Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Instituto de Física, Universidade Federal do Rio Grande do Sul, C.P. 15051, 91501-970 Porto Alegre - RS (Brazil)
2016-07-07
We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.
Behavioral and Statistical Models of Educational Inequality
DEFF Research Database (Denmark)
Holm, Anders; Breen, Richard
2016-01-01
This article addresses the question of how students and their families make educational decisions. We describe three types of behavioral model that might underlie decision-making, and we show that they have consequences for what decisions are made. Our study, thus, has policy implications if we...... wish to encourage students and their families to make better educational choices. We also establish the conditions under which empirical analysis can distinguish between the three sorts of decision-making, and we illustrate our arguments using data from the National Educational Longitudinal Study....
Behavioral and Statistical Models of Educational Inequality
DEFF Research Database (Denmark)
Holm, Anders; Breen, Richard
2016-01-01
This paper addresses the question of how students and their families make educational decisions. We describe three types of behavioral model that might underlie decision-making and we show that they have consequences for what decisions are made. Our study thus has policy implications if we wish...... to encourage students and their families to make better educational choices. We also establish the conditions under which empirical analysis can distinguish between the three sorts of decision-making and we illustrate our arguments using data from the National Educational Longitudinal Study....
Hall, T; Hall, Tim; Jewson, Stephen
2005-01-01
We describe results from the second stage of a project to build a statistical model for hurricane tracks. In the first stage we modelled the unconditional mean track. We now attempt to model the unconditional variance of fluctuations around the mean. The variance models we describe use a semi-parametric nearest neighbours approach in which the optimal averaging length-scale is estimated using a jack-knife out-of-sample fitting procedure. We test three different models. These models consider the variance structure of the deviations from the unconditional mean track to be isotropic, anisotropic but uncorrelated, and anisotropic and correlated, respectively. The results show that, of these models, the anisotropic correlated model gives the best predictions of the distribution of future positions of hurricanes.
Apel, Heiko; Gafurov, Abror; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Merkushkin, Aleksandr; Merz, Bruno
2016-04-01
The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt water of the rivers originating in the mountains provides the only water resource available for agricultural production but also for water collection in reservoirs for energy production in winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources.. In fact, seasonal forecasts are mandatory tasks of national hydro-meteorological services in the region. Thus this study aims at a statistical forecast of the seasonal water availability, whereas the focus is put on the usage of freely available data in order to facilitate an operational use without data access limitations. The study takes the Naryn basin as a test case, at which outlet the Toktogul reservoir stores the discharge of the Naryn River. As most of the water originates form snow and glacier melt, a statistical forecast model should use data sets that can serve as proxy data for the snow masses and snow water equivalent in late spring, which essentially determines the bulk of the seasonal discharge. CRU climate data describing the precipitation and temperature in the basin during winter and spring was used as base information, which was complemented by MODIS snow cover data processed through ModSnow tool, discharge during the spring and also GRACE gravimetry anomalies. For the construction of linear forecast models monthly as well as multi-monthly means over the period January to April were used to predict the seasonal mean discharge of May-September at the station Uchterek. An automatic model selection was performed in multiple steps, whereas the best models were selected according to several performance measures and their robustness in a leave-one-out cross validation. It could be shown that the seasonal discharge can be predicted with
Statistical modelling in biostatistics and bioinformatics selected papers
Peng, Defen
2014-01-01
This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...
Isoscaling in Statistical Sequential Decay Model
Institute of Scientific and Technical Information of China (English)
TIAN Wen-Dong; SU Qian-Min; WANG Hong-Wei; WANG Kun; YAN Ting-ZHi; MA Yu-Gang; CAI Xiang-Zhou; FANG De-Qing; GUO Wei; MA Chun-Wang; LIU Gui-Hua; SHEN Wen-Qing; SHI Yu
2007-01-01
A sequential decay model is used to study isoscaling, I.e. The factorization of the isotope ratios from sources of different isospins and sizes over a broad range of excitation energies, into fugacity terms of proton and neutron number, R21(N, Z) = Y2(N, Z)/Y1(N, Z) = Cexp(αN +βZ). It is found that the isoscaling parameters α and β have a strong dependence on the isospin difference of equilibrated source and excitation energy, no significant influence of the source size on α andβ has been observed. It is found that α and β decrease with the excitation energy and are linear functions of 1/T and △(Z/A)2 or △(N/A)2 of the sources. Symmetry energy coefficient Csym is constrained from the relationship of α and source △(Z/A)2, β and source △(N/A)2.
Process Model Construction and Optimization Using Statistical Experimental Design,
1988-04-01
Memo No. 88-442 ~LECTE March 1988 31988 %,.. MvAY 1 98 0) PROCESS MODEL CONSTRUCTION AND OPTIMIZATION USING STATISTICAL EXPERIMENTAL DESIGN Emmanuel...Sachs and George Prueger Abstract A methodology is presented for the construction of process models by the combination of physically based mechanistic...253-8138. .% I " Process Model Construction and Optimization Using Statistical Experimental Design" by Emanuel Sachs Assistant Professor and George
Nacher, Jose C; Ochiai, Tomoshiro
2012-05-01
Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.
Daisy Models Semi-Poisson statistics and beyond
Hernández-Saldaña, H; Seligman, T H
1999-01-01
Semi-Poisson statistics are shown to be obtained by removing every other number from a random sequence. Retaining every (r+1)th level we obtain a family of secuences which we call daisy models. Their statistical properties coincide with those of Bogomolny's nearest-neighbour interaction Coulomb gas if the inverse temperature coincides with the integer r. In particular the case r=2 reproduces closely the statistics of quasi-optimal solutions of the traveling salesman problem.
Development of statistical models for data analysis
Energy Technology Data Exchange (ETDEWEB)
Downham, D.Y.
2000-07-01
Incidents that cause, or could cause, injury to personnel, and that satisfy specific criteria, are reported to the Offshore Safety Division (OSD) of the Health and Safety Executive (HSE). The underlying purpose of this report is to improve ways of quantifying risk, a recommendation in Lord Cullen's report into the Piper Alpha disaster. Records of injuries and hydrocarbon releases from 1 January, 1991, to 31 March 1996, are analysed, because the reporting of incidents was standardised after 1990. Models are identified for risk assessment and some are applied. The appropriate analyses of one or two factors (or variables) are tests of uniformity or of independence. Radar graphs are used to represent some temporal variables. Cusums are applied for the analysis of incident frequencies over time, and could be applied for regular monitoring. Log-linear models for Poisson-distributed data are identified as being suitable for identifying 'non-random' combinations of more than two factors. Some questions cannot be addressed with the available data: for example, more data are needed to assess the risk of injury per employee in a time interval. If the questions are considered sufficiently important, resources could be assigned to obtain the data. Some of the main results from the analyses are as follows: the cusum analyses identified a change-point at the end of July 1993, when the reported number of injuries reduced by 40%. Injuries were more likely to occur between 8am and 12am or between 2pm and 5pm than at other times: between 2pm and 3pm the number of injuries was almost twice the average and was more than three fold the smallest. No seasonal effects in the numbers of injuries were identified. Three-day injuries occurred more frequently on the 5th, 6th and 7th days into a tour of duty than on other days. Three-day injuries occurred less frequently on the 13th and 14th days of a tour of duty. An injury classified as 'lifting or craning' was
Kuić, Domagoj
2016-07-01
In the previous papers (Kuić et al. in Found Phys 42:319-339, 2012; Kuić in arXiv:1506.02622, 2015), it was demonstrated that applying the principle of maximum information entropy by maximizing the conditional information entropy, subject to the constraint given by the Liouville equation averaged over the phase space, leads to a definition of the rate of entropy change for closed Hamiltonian systems without any additional assumptions. Here, we generalize this basic model and, with the introduction of the additional constraints which are equivalent to the hydrodynamic continuity equations, show that the results obtained are consistent with the known results from the nonequilibrium statistical mechanics and thermodynamics of irreversible processes. In this way, as a part of the approach developed in this paper, the rate of entropy change and entropy production density for the classical Hamiltonian fluid are obtained. The results obtained suggest the general applicability of the foundational principles of predictive statistical mechanics and their importance for the theory of irreversibility.
Multiple-point statistical prediction on fracture networks at Yucca Mountain
Energy Technology Data Exchange (ETDEWEB)
Liu, X.Y; Zhang, C.Y.; Liu, Q.S.; Birkholzer, J.T.
2009-05-01
In many underground nuclear waste repository systems, such as at Yucca Mountain, water flow rate and amount of water seepage into the waste emplacement drifts are mainly determined by hydrological properties of fracture network in the surrounding rock mass. Natural fracture network system is not easy to describe, especially with respect to its connectivity which is critically important for simulating the water flow field. In this paper, we introduced a new method for fracture network description and prediction, termed multi-point-statistics (MPS). The process of the MPS method is to record multiple-point statistics concerning the connectivity patterns of a fracture network from a known fracture map, and to reproduce multiple-scale training fracture patterns in a stochastic manner, implicitly and directly. It is applied to fracture data to study flow field behavior at the Yucca Mountain waste repository system. First, the MPS method is used to create a fracture network with an original fracture training image from Yucca Mountain dataset. After we adopt a harmonic and arithmetic average method to upscale the permeability to a coarse grid, THM simulation is carried out to study near-field water flow in the surrounding waste emplacement drifts. Our study shows that connectivity or patterns of fracture networks can be grasped and reconstructed by MPS methods. In theory, it will lead to better prediction of fracture system characteristics and flow behavior. Meanwhile, we can obtain variance from flow field, which gives us a way to quantify model uncertainty even in complicated coupled THM simulations. It indicates that MPS can potentially characterize and reconstruct natural fracture networks in a fractured rock mass with advantages of quantifying connectivity of fracture system and its simulation uncertainty simultaneously.
Statistical properties of several models of fractional random point processes
Bendjaballah, C.
2011-08-01
Statistical properties of several models of fractional random point processes have been analyzed from the counting and time interval statistics points of view. Based on the criterion of the reduced variance, it is seen that such processes exhibit nonclassical properties. The conditions for these processes to be treated as conditional Poisson processes are examined. Numerical simulations illustrate part of the theoretical calculations.
The Importance of Statistical Modeling in Data Analysis and Inference
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Selection of statistical distributions for prediction of steam generator tube degradation
Energy Technology Data Exchange (ETDEWEB)
Stavropoulos, K.D.; Gorman, J.A. [Dominion Engr., Inc., McLean, VA (United States); Staehle, R.W. [Univ. of Minnesota, Minneapolis, MN (United States); Welty, C.S. Jr. [Electric Power Research Institute, Palo Alto, CA (United States)
1992-12-31
This paper presents the first part of a project directed at developing methods for characterizing and predicting the progression of degradation of PWR steam generator tubes. This first part covers the evaluation of statistical distributions for use in such analyses. The data used in the evaluation of statistical distributions included data for primary water stress corrosion cracking (PWSCC) at roll transitions and U-bends, and intergranular attack/stress corrosion cracking (IGA/SCC) at tube sheet and tube support plate crevices. Laboratory data for PWSCC of reverse U-bends were also used. The review of statistical distributions indicated that the Weibull distribution provides an easy to use and effective method. Another statistical function, the log-normal, was found to provide essentially equivalent results. Two parameter fits, without an initiation time, were found to provide the most reliable predictions.
Mathematical-statistical models of generated hazardous hospital solid waste.
Awad, A R; Obeidat, M; Al-Shareef, M
2004-01-01
This research work was carried out under the assumption that wastes generated from hospitals in Irbid, Jordan were hazardous. The hazardous and non-hazardous wastes generated from the different divisions in the three hospitals under consideration were not separated during collection process. Three hospitals, Princess Basma hospital (public), Princess Bade'ah hospital (teaching), and Ibn Al-Nafis hospital (private) in Irbid were selected for this study. The research work took into account the amounts of solid waste accumulated from each division and also determined the total amount generated from each hospital. The generation rates were determined (kilogram per patient, per day; kilogram per bed, per day) for the three hospitals. These generation rates were compared with similar hospitals in Europe. The evaluation suggested that the current situation regarding the management of these wastes in the three studied hospitals needs revision as these hospitals do not follow methods of waste disposals that would reduce risk to human health and the environment practiced in developed countries. Statistical analysis was carried out to develop models for the prediction of the quantity of waste generated at each hospital (public, teaching, private). In these models number of patients, beds, and type of hospital were revealed to be significant factors on quantity of waste generated. Multiple regressions were also used to estimate the quantities of wastes generated from similar divisions in the three hospitals (surgery, internal diseases, and maternity).
Flashover of a vacuum-insulator interface: A statistical model
Directory of Open Access Journals (Sweden)
W. A. Stygar
2004-07-01
Full Text Available We have developed a statistical model for the flashover of a 45° vacuum-insulator interface (such as would be found in an accelerator subject to a pulsed electric field. The model assumes that the initiation of a flashover plasma is a stochastic process, that the characteristic statistical component of the flashover delay time is much greater than the plasma formative time, and that the average rate at which flashovers occur is a power-law function of the instantaneous value of the electric field. Under these conditions, we find that the flashover probability is given by 1-exp(-E_{p}^{β}t_{eff}C/k^{β}, where E_{p} is the peak value in time of the spatially averaged electric field E(t, t_{eff}≡∫[E(t/E_{p}]^{β}dt is the effective pulse width, C is the insulator circumference, k∝exp(λ/d, and β and λ are constants. We define E(t as V(t/d, where V(t is the voltage across the insulator and d is the insulator thickness. Since the model assumes that flashovers occur at random azimuthal locations along the insulator, it does not apply to systems that have a significant defect, i.e., a location contaminated with debris or compromised by an imperfection at which flashovers repeatedly take place, and which prevents a random spatial distribution. The model is consistent with flashover measurements to within 7% for pulse widths between 0.5 ns and 10 μs, and to within a factor of 2 between 0.5 ns and 90 s (a span of over 11 orders of magnitude. For these measurements, E_{p} ranges from 64 to 651 kV/cm, d from 0.50 to 4.32 cm, and C from 4.96 to 95.74 cm. The model is significantly more accurate, and is valid over a wider range of parameters, than the J. C. Martin flashover relation that has been in use since 1971 [J. C. Martin on Pulsed Power, edited by T. H. Martin, A. H. Guenther, and M. Kristiansen (Plenum, New York, 1996]. We have generalized the statistical model to estimate the total-flashover probability of an
Directory of Open Access Journals (Sweden)
Kim Hyun-Sil
2014-12-01
Full Text Available Insertion loss prediction of large acoustical enclosures using Statistical Energy Analysis (SEA method is presented. The SEA model consists of three elements: sound field inside the enclosure, vibration energy of the enclosure panel, and sound field outside the enclosure. It is assumed that the space surrounding the enclosure is sufficiently large so that there is no energy flow from the outside to the wall panel or to air cavity inside the enclosure. The comparison of the predicted insertion loss to the measured data for typical large acoustical enclosures shows good agreements. It is found that if the critical frequency of the wall panel falls above the frequency region of interest, insertion loss is dominated by the sound transmission loss of the wall panel and averaged sound absorption coefficient inside the enclosure. However, if the critical frequency of the wall panel falls into the frequency region of interest, acoustic power from the sound radiation by the wall panel must be added to the acoustic power from transmission through the panel.
Improving statistical reasoning theoretical models and practical implications
Sedlmeier, Peter
1999-01-01
This book focuses on how statistical reasoning works and on training programs that can exploit people''s natural cognitive capabilities to improve their statistical reasoning. Training programs that take into account findings from evolutionary psychology and instructional theory are shown to have substantially larger effects that are more stable over time than previous training regimens. The theoretical implications are traced in a neural network model of human performance on statistical reasoning problems. This book apppeals to judgment and decision making researchers and other cognitive scientists, as well as to teachers of statistics and probabilistic reasoning.
Powerline Communications Channel Modelling Methodology Based on Statistical Features
Tan, Bo
2012-01-01
This paper proposes a new channel modelling method for powerline communications networks based on the multipath profile in the time domain. The new channel model is developed to be applied in a range of Powerline Communications (PLC) research topics such as impulse noise modelling, deployment and coverage studies, and communications theory analysis. To develop the methodology, channels are categorised according to their propagation distance and power delay profile. The statistical multipath parameters such as path arrival time, magnitude and interval for each category are analyzed to build the model. Each generated channel based on the proposed statistical model represents a different realisation of a PLC network. Simulation results in similar the time and frequency domains show that the proposed statistical modelling method, which integrates the impact of network topology presents the PLC channel features as the underlying transmission line theory model. Furthermore, two potential application scenarios are d...
Lawrence, Stephen J.
2012-01-01
Water-based recreation—such as rafting, canoeing, and fishing—is popular among visitors to the Chattahoochee River National Recreation Area (CRNRA) in north Georgia. The CRNRA is a 48-mile reach of the Chattahoochee River upstream from Atlanta, Georgia, managed by the National Park Service (NPS). Historically, high densities of fecal-indicator bacteria have been documented in the Chattahoochee River and its tributaries at levels that commonly exceeded Georgia water-quality standards. In October 2000, the NPS partnered with the U.S. Geological Survey (USGS), State and local agencies, and non-governmental organizations to monitor Escherichia coli bacteria (E. coli) density and develop a system to alert river users when E. coli densities exceeded the U.S. Environmental Protection Agency (USEPA) single-sample beach criterion of 235 colonies (most probable number) per 100 milliliters (MPN/100 mL) of water. This program, called BacteriALERT, monitors E. coli density, turbidity, and water temperature at two sites on the Chattahoochee River upstream from Atlanta, Georgia. This report summarizes E. coli bacteria density and turbidity values in water samples collected between 2000 and 2008 as part of the BacteriALERT program; describes the relations between E. coli density and turbidity, streamflow characteristics, and season; and describes the regression analyses used to develop predictive models that estimate E. coli density in real time at both sampling sites.
Isospin dependence of nuclear multifragmentation in statistical model
Institute of Scientific and Technical Information of China (English)
张蕾; 谢东珠; 张艳萍; 高远
2011-01-01
The evolution of nuclear disintegration mechanisms with increasing excitation energy, from compound nucleus to multifragmentation, has been studied by using the Statistical Multifragmentation Model （SMM） within a micro-canonical ensemble. We discuss the o
Statistical validation of normal tissue complication probability models
Xu, Cheng-Jian; van der Schaaf, Arjen; van t Veld, Aart; Langendijk, Johannes A.; Schilstra, Cornelis
2012-01-01
PURPOSE: To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: A penalized regression method, LASSO (least absolute shrinkage
A prediction model for Clostridium difficile recurrence
Directory of Open Access Journals (Sweden)
Francis D. LaBarbera
2015-02-01
Full Text Available Background: Clostridium difficile infection (CDI is a growing problem in the community and hospital setting. Its incidence has been on the rise over the past two decades, and it is quickly becoming a major concern for the health care system. High rate of recurrence is one of the major hurdles in the successful treatment of C. difficile infection. There have been few studies that have looked at patterns of recurrence. The studies currently available have shown a number of risk factors associated with C. difficile recurrence (CDR; however, there is little consensus on the impact of most of the identified risk factors. Methods: Our study was a retrospective chart review of 198 patients diagnosed with CDI via Polymerase Chain Reaction (PCR from February 2009 to Jun 2013. In our study, we decided to use a machine learning algorithm called the Random Forest (RF to analyze all of the factors proposed to be associated with CDR. This model is capable of making predictions based on a large number of variables, and has outperformed numerous other models and statistical methods. Results: We came up with a model that was able to accurately predict the CDR with a sensitivity of 83.3%, specificity of 63.1%, and area under curve of 82.6%. Like other similar studies that have used the RF model, we also had very impressive results. Conclusions: We hope that in the future, machine learning algorithms, such as the RF, will see a wider application.
Ground Motion Prediction Models for Caucasus Region
Jorjiashvili, Nato; Godoladze, Tea; Tvaradze, Nino; Tumanova, Nino
2016-04-01
Ground motion prediction models (GMPMs) relate ground motion intensity measures to variables describing earthquake source, path, and site effects. Estimation of expected ground motion is a fundamental earthquake hazard assessment. The most commonly used parameter for attenuation relation is peak ground acceleration or spectral acceleration because this parameter gives useful information for Seismic Hazard Assessment. Since 2003 development of Georgian Digital Seismic Network has started. In this study new GMP models are obtained based on new data from Georgian seismic network and also from neighboring countries. Estimation of models is obtained by classical, statistical way, regression analysis. In this study site ground conditions are additionally considered because the same earthquake recorded at the same distance may cause different damage according to ground conditions. Empirical ground-motion prediction models (GMPMs) require adjustment to make them appropriate for site-specific scenarios. However, the process of making such adjustments remains a challenge. This work presents a holistic framework for the development of a peak ground acceleration (PGA) or spectral acceleration (SA) GMPE that is easily adjustable to different seismological conditions and does not suffer from the practical problems associated with adjustments in the response spectral domain.
A generative model for predicting terrorist incidents
Verma, Dinesh C.; Verma, Archit; Felmlee, Diane; Pearson, Gavin; Whitaker, Roger
2017-05-01
A major concern in coalition peace-support operations is the incidence of terrorist activity. In this paper, we propose a generative model for the occurrence of the terrorist incidents, and illustrate that an increase in diversity, as measured by the number of different social groups to which that an individual belongs, is inversely correlated with the likelihood of a terrorist incident in the society. A generative model is one that can predict the likelihood of events in new contexts, as opposed to statistical models which are used to predict the future incidents based on the history of the incidents in an existing context. Generative models can be useful in planning for persistent Information Surveillance and Reconnaissance (ISR) since they allow an estimation of regions in the theater of operation where terrorist incidents may arise, and thus can be used to better allocate the assignment and deployment of ISR assets. In this paper, we present a taxonomy of terrorist incidents, identify factors related to occurrence of terrorist incidents, and provide a mathematical analysis calculating the likelihood of occurrence of terrorist incidents in three common real-life scenarios arising in peace-keeping operations
From p+p to Pb+Pb Collisions: Wounded Nucleon versus Statistical Models
Gazdzicki, Marek
2013-01-01
System size dependence of hadron production properties is discussed within the Wounded Nucleon Model and the Statistical Model in the grand canonical, canonical and micro-canonical formulations. Similarities and differences between predictions of the models related to the treatment of conservation laws are exposed. A need for models which would combine a hydrodynamical-like expansion with conservation laws obeyed in individual collisions is stressed.
A no extensive statistical model for the nucleon structure function
Trevisan, Luis A.; Mirez, Carlos
2013-03-01
We studied an application of nonextensive thermodynamics to describe the structure function of nucleon, in a model where the usual Fermi-Dirac and Bose-Einstein energy distribution were replaced by the equivalent functions of the q-statistical. The parameters of the model are given by an effective temperature T, the q parameter (from Tsallis statistics), and two chemical potentials given by the corresponding up (u) and down (d) quark normalization in the nucleon.
Model of risk assessment under ballistic statistical tests
Gabrovski, Ivan; Karakaneva, Juliana
The material presents the application of a mathematical method for risk assessment under statistical determination of the ballistic limits of the protection equipment. The authors have implemented a mathematical model based on Pierson's criteria. The software accomplishment of the model allows to evaluate the V50 indicator and to assess the statistical hypothesis' reliability. The results supply the specialists with information about the interval valuations of the probability determined during the testing process.
Statistical methods in joint modeling of longitudinal and survival data
Dempsey, Walter
Survival studies often generate not only a survival time for each patient but also a sequence of health measurements at annual or semi-annual check-ups while the patient remains alive. Such a sequence of random length accompanied by a survival time is called a survival process. Ordinarily robust health is associated with longer survival, so the two parts of a survival process cannot be assumed independent. The first part of the thesis is concerned with a general technique---reverse alignment---for constructing statistical models for survival processes. A revival model is a regression model in the sense that it incorporates covariate and treatment effects into both the distribution of survival times and the joint distribution of health outcomes. The revival model also determines a conditional survival distribution given the observed history, which describes how the subsequent survival distribution is determined by the observed progression of health outcomes. The second part of the thesis explores the concept of a consistent exchangeable survival process---a joint distribution of survival times in which the risk set evolves as a continuous-time Markov process with homogeneous transition rates. A correspondence with the de Finetti approach of constructing an exchangeable survival process by generating iid survival times conditional on a completely independent hazard measure is shown. Several specific processes are detailed, showing how the number of blocks of tied failure times grows asymptotically with the number of individuals in each case. In particular, we show that the set of Markov survival processes with weakly continuous predictive distributions can be characterized by a two-dimensional family called the harmonic process. The outlined methods are then applied to data, showing how they can be easily extended to handle censoring and inhomogeneity among patients.
Baran, Sándor; Möller, Annette
2017-02-01
Forecast ensembles are typically employed to account for prediction uncertainties in numerical weather prediction models. However, ensembles often exhibit biases and dispersion errors, thus they require statistical post-processing to improve their predictive performance. Two popular univariate post-processing models are the Bayesian model averaging (BMA) and the ensemble model output statistics (EMOS). In the last few years, increased interest has emerged in developing multivariate post-processing models, incorporating dependencies between weather quantities, such as for example a bivariate distribution for wind vectors or even a more general setting allowing to combine any types of weather variables. In line with a recently proposed approach to model temperature and wind speed jointly by a bivariate BMA model, this paper introduces an EMOS model for these weather quantities based on a bivariate truncated normal distribution. The bivariate EMOS model is applied to temperature and wind speed forecasts of the 8-member University of Washington mesoscale ensemble and the 11-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and its predictive performance is compared to the performance of the bivariate BMA model and a multivariate Gaussian copula approach, post-processing the margins with univariate EMOS. While the predictive skills of the compared methods are similar, the bivariate EMOS model requires considerably lower computation times than the bivariate BMA method.
Thiessen, Erik D
2017-01-05
Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik
PREDICT : model for prediction of survival in localized prostate cancer
Kerkmeijer, Linda G W; Monninkhof, Evelyn M.; van Oort, Inge M.; van der Poel, Henk G.; de Meerleer, Gert; van Vulpen, Marco
2016-01-01
Purpose: Current models for prediction of prostate cancer-specific survival do not incorporate all present-day interventions. In the present study, a pre-treatment prediction model for patients with localized prostate cancer was developed.Methods: From 1989 to 2008, 3383 patients were treated with I
Pseudo-dynamic source modelling with 1-point and 2-point statistics of earthquake source parameters
Song, S. G.
2013-12-24
Ground motion prediction is an essential element in seismic hazard and risk analysis. Empirical ground motion prediction approaches have been widely used in the community, but efficient simulation-based ground motion prediction methods are needed to complement empirical approaches, especially in the regions with limited data constraints. Recently, dynamic rupture modelling has been successfully adopted in physics-based source and ground motion modelling, but it is still computationally demanding and many input parameters are not well constrained by observational data. Pseudo-dynamic source modelling keeps the form of kinematic modelling with its computational efficiency, but also tries to emulate the physics of source process. In this paper, we develop a statistical framework that governs the finite-fault rupture process with 1-point and 2-point statistics of source parameters in order to quantify the variability of finite source models for future scenario events. We test this method by extracting 1-point and 2-point statistics from dynamically derived source models and simulating a number of rupture scenarios, given target 1-point and 2-point statistics. We propose a new rupture model generator for stochastic source modelling with the covariance matrix constructed from target 2-point statistics, that is, auto- and cross-correlations. Our sensitivity analysis of near-source ground motions to 1-point and 2-point statistics of source parameters provides insights into relations between statistical rupture properties and ground motions. We observe that larger standard deviation and stronger correlation produce stronger peak ground motions in general. The proposed new source modelling approach will contribute to understanding the effect of earthquake source on near-source ground motion characteristics in a more quantitative and systematic way.
Shape-correlated Deformation Statistics for Respiratory Motion Prediction in 4D Lung
Liu, Xiaoxiao; Oguz, Ipek; Pizer, Stephen M.; Mageras, Gig S.
2010-01-01
4D image-guided radiation therapy (IGRT) for free-breathing lungs is challenging due to the complicated respiratory dynamics. Effective modeling of respiratory motion is crucial to account for the motion affects on the dose to tumors. We propose a shape-correlated statistical model on dense image deformations for patient-specic respiratory motion estimation in 4D lung IGRT. Using the shape deformations of the high-contrast lungs as the surrogate, the statistical model trained from the plannin...
Sumnall, Matthew; Peduzzi, Alicia; Fox, Thomas R.; Wynne, Randolph H.; Thomas, Valerie A.; Cook, Bruce
2016-01-01
Leaf area is an important forest structural variable which serves as the primary means of mass and energy exchange within vegetated ecosystems. The objective of the current study was to determine if leaf area index (LAI) could be estimated accurately and consistently in five intensively managed pine plantation forests using two multiple-return airborne LiDAR datasets. Field measurements of LAI were made using the LiCOR LAI2000 and LAI2200 instruments within 116 plots were established of varying size and within a variety of stand conditions (i.e. stand age, nutrient regime and stem density) in North Carolina and Virginia in 2008 and 2013. A number of common LiDAR return height and intensity distribution metrics were calculated (e.g. average return height), in addition to ten indices, with two additional variants, utilized in the surrounding literature which have been used to estimate LAI and fractional cover, were calculated from return heights and intensity, for each plot extent. Each of the indices was assessed for correlation with each other, and was used as independent variables in linear regression analysis with field LAI as the dependent variable. All LiDAR derived metrics were also entered into a forward stepwise linear regression. The results from each of the indices varied from an R2 of 0.33 (S.E. 0.87) to 0.89 (S.E. 0.36). Those indices calculated using ratios of all returns produced the strongest correlations, such as the Above and Below Ratio Index (ABRI) and Laser Penetration Index 1 (LPI1). The regression model produced from a combination of three metrics did not improve correlations greatly (R2 0.90; S.E. 0.35). The results indicate that LAI can be predicted over a range of intensively managed pine plantation forest environments accurately when using different LiDAR sensor designs. Those indices which incorporated counts of specific return numbers (e.g. first returns) or return intensity correlated poorly with field measurements. There were
Models for probability and statistical inference theory and applications
Stapleton, James H
2007-01-01
This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...
Electron impact ionization of tungsten ions in a statistical model
Demura, A. V.; Kadomtsev, M. B.; Lisitsa, V. S.; Shurygin, V. A.
2015-01-01
The statistical model for calculations of the electron impact ionization cross sections of multielectron ions is developed for the first time. The model is based on the idea of collective excitations of atomic electrons with the local plasma frequency, while the Thomas-Fermi model is used for atomic electrons density distribution. The electron impact ionization cross sections and related ionization rates of tungsten ions from W+ up to W63+ are calculated and then compared with the vast collection of modern experimental and modeling results. The reasonable correspondence between experimental and theoretical data demonstrates the universal nature of statistical approach to the description of atomic processes in multielectron systems.
Modelling malaria treatment practices in Bangladesh using spatial statistics
Directory of Open Access Journals (Sweden)
Haque Ubydul
2012-03-01
Full Text Available Abstract Background Malaria treatment-seeking practices vary worldwide and Bangladesh is no exception. Individuals from 88 villages in Rajasthali were asked about their treatment-seeking practices. A portion of these households preferred malaria treatment from the National Control Programme, but still a large number of households continued to use drug vendors and approximately one fourth of the individuals surveyed relied exclusively on non-control programme treatments. The risks of low-control programme usage include incomplete malaria treatment, possible misuse of anti-malarial drugs, and an increased potential for drug resistance. Methods The spatial patterns of treatment-seeking practices were first examined using hot-spot analysis (Local Getis-Ord Gi statistic and then modelled using regression. Ordinary least squares (OLS regression identified key factors explaining more than 80% of the variation in control programme and vendor treatment preferences. Geographically weighted regression (GWR was then used to assess where each factor was a strong predictor of treatment-seeking preferences. Results Several factors including tribal affiliation, housing materials, household densities, education levels, and proximity to the regional urban centre, were found to be effective predictors of malaria treatment-seeking preferences. The predictive strength of each of these factors, however, varied across the study area. While education, for example, was a strong predictor in some villages, it was less important for predicting treatment-seeking outcomes in other villages. Conclusion Understanding where each factor is a strong predictor of treatment-seeking outcomes may help in planning targeted interventions aimed at increasing control programme usage. Suggested strategies include providing additional training for the Building Resources across Communities (BRAC health workers, implementing educational programmes, and addressing economic factors.
Bouchet, Freddy; Dauxois, Thierry
2005-10-01
We explain the ubiquity and extremely slow evolution of non-Gaussian out-of-equilibrium distributions for the Hamiltonian mean-field model, by means of traditional kinetic theory. Deriving the Fokker-Planck equation for a test particle, one also unambiguously explains and predicts striking slow algebraic relaxation of the momenta autocorrelation, previously found in numerical simulations. Finally, angular anomalous diffusion are predicted for a large class of initial distributions. Non-extensive statistical mechanics is shown to be unnecessary for the interpretation of these phenomena.
Tarasova, Irina A; Goloborodko, Anton A; Perlova, Tatyana Y; Pridatchenko, Marina L; Gorshkov, Alexander V; Evreinov, Victor V; Ivanov, Alexander R; Gorshkov, Mikhail V
2015-07-07
The theory of critical chromatography for biomacromolecules (BioLCCC) describes polypeptide retention in reversed-phase HPLC using the basic principles of statistical thermodynamics. However, whether this theory correctly depicts a variety of empirical observations and laws introduced for peptide chromatography over the last decades remains to be determined. In this study, by comparing theoretical results with experimental data, we demonstrate that the BioLCCC: (1) fits the empirical dependence of the polypeptide retention on the amino acid sequence length with R(2) > 0.99 and allows in silico determination of the linear regression coefficients of the log-length correction in the additive model for arbitrary sequences and lengths and (2) predicts the distribution coefficients of polypeptides with an accuracy from 0.98 to 0.99 R(2). The latter enables direct calculation of the retention factors for given solvent compositions and modeling of the migration dynamics of polypeptides separated under isocratic or gradient conditions. The obtained results demonstrate that the suggested theory correctly relates the main aspects of polypeptide separation in reversed-phase HPLC.
Prediction of quantiles by statistical learning and application to GDP forecasting
Alquier, Pierre
2012-01-01
In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator (also known as Exponentially Weighted aggregate) is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of Koenker and Bassett (1978), this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results.
Information Geometric Complexity of a Trivariate Gaussian Statistical Model
Directory of Open Access Journals (Sweden)
Domenico Felice
2014-05-01
Full Text Available We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.
Prediction of permeability for porous media reconstructed using multiple-point statistics.
Okabe, Hiroshi; Blunt, Martin J
2004-12-01
To predict multiphase flow through geologically realistic porous media, it is necessary to have a three-dimensional (3D) representation of the pore space. We use multiple-point statistics based on two-dimensional (2D) thin sections as training images to generate geologically realistic 3D pore-space representations. Thin-section images can provide multiple-point statistics, which describe the statistical relation between multiple spatial locations and use the probability of occurrence of particular patterns. Assuming that the medium is isotropic, a 3D image can be generated that preserves typical patterns of the void space seen in the thin sections. The method is tested on Berea sandstone for which a 3D image from micro-CT (Computerized Tomography) scanning is available and shows that the use of multiple-point statistics allows the long-range connectivity of the structure to be preserved, in contrast to two-point statistics methods that tend to underestimate the connectivity. Furthermore, a high-resolution 2D thin-section image of a carbonate reservoir rock is used to reconstruct 3D structures by the proposed method. The permeabilities of the statistical images are computed using the lattice-Boltzmann method (LBM). The results are similar to the measured values, to the permeability directly computed on the micro-CT image for Berea and to predictions using analysis of the 2D images and the effective medium approximation.
Optimizing the prediction process: from statistical concepts to the case study of soccer.
Directory of Open Access Journals (Sweden)
Andreas Heuer
Full Text Available We present a systematic approach for prediction purposes based on panel data, involving information about different interacting subjects and different times (here: two. The corresponding bivariate regression problem can be solved analytically for the final statistical estimation error. Furthermore, this expression is simplified for the special case that the subjects do not change their properties between the last measurement and the prediction period. This statistical framework is applied to the prediction of soccer matches, based on information from the previous and the present season. It is determined how well the outcome of soccer matches can be predicted theoretically. This optimum limit is compared with the actual quality of the prediction, taking the German premier league as an example. As a key step for the actual prediction process one has to identify appropriate observables which reflect the strength of the individual teams as close as possible. A criterion to distinguish different observables is presented. Surprisingly, chances for goals turn out to be much better suited than the goals themselves to characterize the strength of a team. Routes towards further improvement of the prediction are indicated. Finally, two specific applications are discussed.
Predictability in models of the atmospheric circulation.
Houtekamer, P.L.
1992-01-01
It will be clear from the above discussions that skill forecasts are still in their infancy. Operational skill predictions do not exist. One is still struggling to prove that skill predictions, at any range, have any quality at all. It is not clear what the statistics of the analysis error are. The
Decoding Beta-Decay Systematics: A Global Statistical Model for Beta^- Halflives
Costiris, N J; Gernoth, K A; Clark, J W
2008-01-01
Statistical modeling of nuclear data provides a novel approach to nuclear systematics complementary to established theoretical and phenomenological approaches based on quantum theory. Continuing previous studies in which global statistical modeling is pursued within the general framework of machine learning theory, we implement advances in training algorithms designed to improved generalization, in application to the problem of reproducing and predicting the halflives of nuclear ground states that decay 100% by the beta^- mode. More specifically, fully-connected, multilayer feedforward artificial neural network models are developed using the Levenberg-Marquardt optimization algorithm together with Bayesian regularization and cross-validation. The predictive performance of models emerging from extensive computer experiments is compared with that of traditional microscopic and phenomenological models as well as with the performance of other learning systems, including earlier neural network models as well as th...
Validation of Biomarker-based risk prediction models
Taylor, Jeremy M.G.; Ankerst, Donna P.; Andridge, Rebecca R.
2008-01-01
The increasing availability and use of predictive models to facilitate informed decision making highlights the need for careful assessment of the validity of these models. In particular, models involving biomarkers require careful validation for two reasons: issues with overfitting when complex models involve a large number of biomarkers, and inter-laboratory variation in assays used to measure biomarkers. In this paper we distinguish between internal and external statistical validation. Inte...
Equilibrium Statistical-Thermal Models in High-Energy Physics
Tawfik, Abdel Nasser
2014-01-01
We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics, that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948 an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analysed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-par...
Foundations of statistical methods for multiple sequence alignment and structure prediction
Energy Technology Data Exchange (ETDEWEB)
Lawrence, C. [New York State Dept. of Health, Albany, NY (United States). Wadsworth Center for Labs. and Research
1995-12-31
Statistical algorithms have proven to be useful in computational molecular biology. Many statistical problems are most easily addressed by pretending that critical missing data are available. For some problems statistical inference in facilitated by creating a set of latent variables, none of whose variables are observed. A key observation is that conditional probabilities for the values of the missing data can be inferred by application of Bayes theorem to the observed data. The statistical framework described in this paper employs Boltzmann like models, permutated data likelihood, EM, and Gibbs sampler algorithms. This tutorial reviews the common statistical framework behind all of these algorithms largely in tabular or graphical terms, illustrates its application, and describes the biological underpinnings of the models used.
Zielke, Olaf; McDougall, Damon; Mai, Martin; Babuska, Ivo
2014-05-01
Seismic, often augmented with geodetic data, are frequently used to invert for the spatio-temporal evolution of slip along a rupture plane. The resulting images of the slip evolution for a single event, inferred by different research teams, often vary distinctly, depending on the adopted inversion approach and rupture model parameterization. This observation raises the question, which of the provided kinematic source inversion solutions is most reliable and most robust, and — more generally — how accurate are fault parameterization and solution predictions? These issues are not included in "standard" source inversion approaches. Here, we present a statistical inversion approach to constrain kinematic rupture parameters from teleseismic body waves. The approach is based a) on a forward-modeling scheme that computes synthetic (body-)waves for a given kinematic rupture model, and b) on the QUESO (Quantification of Uncertainty for Estimation, Simulation, and Optimization) library that uses MCMC algorithms and Bayes theorem for sample selection. We present Bayesian inversions for rupture parameters in synthetic earthquakes (i.e. for which the exact rupture history is known) in an attempt to identify the cross-over at which further model discretization (spatial and temporal resolution of the parameter space) is no longer attributed to a decreasing misfit. Identification of this cross-over is of importance as it reveals the resolution power of the studied data set (i.e. teleseismic body waves), enabling one to constrain kinematic earthquake rupture histories of real earthquakes at a resolution that is supported by data. In addition, the Bayesian approach allows for mapping complete posterior probability density functions of the desired kinematic source parameters, thus enabling us to rigorously assess the uncertainties in earthquake source inversions.
Statistical Model and the mesonic-baryonic transition region
Oeschler, H.; Redlich, K.; Wheaton, S.
2009-01-01
The statistical model assuming chemical equilibriumand local strangeness conservation describes most of the observed features of strange particle production from SIS up to RHIC. Deviations are found as the maximum in the measured K+/pi+ ratio is much sharper than in the model calculations. At the incident energy of the maximum, the statistical model shows that freeze out changes regime from one being dominated by baryons at the lower energies toward one being dominated by mesons. It will be shown how deviations from the usual freeze-out curve influence the various particle ratios. Furthermore, other observables exhibit also changes just in this energy regime.
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2006-01-01
Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo
Multiple commodities in statistical microeconomics: Model and market
Baaquie, Belal E.; Yu, Miao; Du, Xin
2016-11-01
A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.
Speech emotion recognition based on statistical pitch model
Institute of Scientific and Technical Information of China (English)
WANG Zhiping; ZHAO Li; ZOU Cairong
2006-01-01
A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.
What is the meaning of the statistical hadronization model?
Becattini, F
2005-01-01
The statistical model of hadronization succeeds in reproducing particle abundances and transverse momentum spectra in high energy collisions of elementary particles as well as of heavy ions. Despite its apparent success, the interpretation of these results is controversial and the validity of the approach very often questioned. In this paper, we would like to summarize the whole issue by first outlining a basic formulation of the model and then comment on the main criticisms and different kinds of interpretations, with special emphasis on the so-called "phase space dominance". While the ultimate answer to the question why the statistical model works should certainly be pursued, we stress that it is a priority to confirm or disprove the fundamental scheme of the statistical model by performing some detailed tests on the rates of exclusive channels at lower energy.
An explicit statistical model of learning lexical segmentation using multiple cues
Çöltekin, Ça ̆grı; Nerbonne, John; Lenci, Alessandro; Padró, Muntsa; Poibeau, Thierry; Villavicencio, Aline
2014-01-01
This paper presents an unsupervised and incremental model of learning segmentation that combines multiple cues whose use by children and adults were attested by experimental studies. The cues we exploit in this study are predictability statistics, phonotactics, lexical stress and partial lexical inf
An explicit statistical model of learning lexical segmentation using multiple cues
Çöltekin, Ça ̆grı; Nerbonne, John; Lenci, Alessandro; Padró, Muntsa; Poibeau, Thierry; Villavicencio, Aline
2014-01-01
This paper presents an unsupervised and incremental model of learning segmentation that combines multiple cues whose use by children and adults were attested by experimental studies. The cues we exploit in this study are predictability statistics, phonotactics, lexical stress and partial lexical
Complex Data Modeling and Computationally Intensive Statistical Methods
Mantovan, Pietro
2010-01-01
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
In all likelihood statistical modelling and inference using likelihood
Pawitan, Yudi
2001-01-01
Based on a course in the theory of statistics this text concentrates on what can be achieved using the likelihood/Fisherian method of taking account of uncertainty when studying a statistical problem. It takes the concept ot the likelihood as providing the best methods for unifying the demands of statistical modelling and the theory of inference. Every likelihood concept is illustrated by realistic examples, which are not compromised by computational problems. Examples range from asimile comparison of two accident rates, to complex studies that require generalised linear or semiparametric mode
Binary and Ternary Fission Within the Statistical Model
Adamian, Gurgen G.; Andreev, Alexander V.; Antonenko, Nikolai V.; Scheid, Werner
The binary and ternary nuclear fission are treated within the statistical model. At the scission point we calculate the potentials as functions of the deformations of the fragments in the dinuclear model. The potentials give the mass and charge distributions of the fission fragments. The ternary fission is assumed to occur during the binary fission.
Statistical model of the classification of shale in a hydrocyclone
Energy Technology Data Exchange (ETDEWEB)
Lopachenok, L.V.; Punin, A.E.; Belyanin, Yu.I.; Proskuryakov, V.A.
1977-10-01
The mathematical model obtained by experimental and statistical methods for the classification of shale in a hydrocyclone is adequate for a real industrial-scale process, as indicated by the statistical analysis carried out for it, and together with the material-balance relationships it permits the calculation of the engineering parameters for any classification conditions within the region of the factor space investigated, as well as the search for the optimum conditions for the industrial realization of the process.
General Linear Models: An Integrated Approach to Statistics
Andrew Faulkner; Sylvain Chartier
2008-01-01
Generally, in psychology, the various statistical analyses are taught independently from each other. As a consequence, students struggle to learn new statistical analyses, in contexts that differ from their textbooks. This paper gives a short introduction to the general linear model (GLM), in which it is showed that ANOVA (one-way, factorial, repeated measure and analysis of covariance) is simply a multiple correlation/regression analysis (MCRA). Generalizations to other cases, such as multiv...
Model-based uncertainty in species range prediction
DEFF Research Database (Denmark)
Pearson, R. G.; Thuiller, Wilfried; Bastos Araujo, Miguel;
2006-01-01
Aim Many attempts to predict the potential range of species rely on environmental niche (or 'bioclimate envelope') modelling, yet the effects of using different niche-based methodologies require further investigation. Here we investigate the impact that the choice of model can have on predictions...... day (using the area under the receiver operating characteristic curve (AUC) and kappa statistics) and by assessing consistency in predictions of range size changes under future climate (using cluster analysis). Results Our analyses show significant differences between predictions from different models......, with predicted changes in range size by 2030 differing in both magnitude and direction (e.g. from 92% loss to 322% gain). We explain differences with reference to two characteristics of the modelling techniques: data input requirements (presence/absence vs. presence-only approaches) and assumptions made by each...
Validation of statistical models for creep rupture by parametric analysis
Energy Technology Data Exchange (ETDEWEB)
Bolton, J., E-mail: john.bolton@uwclub.net [65, Fisher Ave., Rugby, Warks CV22 5HW (United Kingdom)
2012-01-15
Statistical analysis is an efficient method for the optimisation of any candidate mathematical model of creep rupture data, and for the comparative ranking of competing models. However, when a series of candidate models has been examined and the best of the series has been identified, there is no statistical criterion to determine whether a yet more accurate model might be devised. Hence there remains some uncertainty that the best of any series examined is sufficiently accurate to be considered reliable as a basis for extrapolation. This paper proposes that models should be validated primarily by parametric graphical comparison to rupture data and rupture gradient data. It proposes that no mathematical model should be considered reliable for extrapolation unless the visible divergence between model and data is so small as to leave no apparent scope for further reduction. This study is based on the data for a 12% Cr alloy steel used in BS PD6605:1998 to exemplify its recommended statistical analysis procedure. The models considered in this paper include a) a relatively simple model, b) the PD6605 recommended model and c) a more accurate model of somewhat greater complexity. - Highlights: Black-Right-Pointing-Pointer The paper discusses the validation of creep rupture models derived from statistical analysis. Black-Right-Pointing-Pointer It demonstrates that models can be satisfactorily validated by a visual-graphic comparison of models to data. Black-Right-Pointing-Pointer The method proposed utilises test data both as conventional rupture stress and as rupture stress gradient. Black-Right-Pointing-Pointer The approach is shown to be more reliable than a well-established and widely used method (BS PD6605).
Statistical Design Model (SDM) of satellite thermal control subsystem
Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi
2016-07-01
Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware
Majda, Andrew J; Gershgorin, Boris
2011-08-02
Understanding and improving the predictive skill of imperfect models for complex systems in their response to external forcing is a crucial issue in diverse applications such as for example climate change science. Equilibrium statistical fidelity of the imperfect model on suitable coarse-grained variables is a necessary but not sufficient condition for this predictive skill, and elementary examples are given here demonstrating this. Here, with equilibrium statistical fidelity of the imperfect model, a direct link is developed between the predictive fidelity of specific test problems in the training phase where the perfect natural system is observed and the predictive skill for the forced response of the imperfect model by combining appropriate concepts from information theory with other concepts based on the fluctuation dissipation theorem. Here a suite of mathematically tractable models with nontrivial eddy diffusivity, variance, and intermittent non-Gaussian statistics mimicking crucial features of atmospheric tracers together with stochastically forced standard eddy diffusivity approximation with model error are utilized to illustrate this link.
Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification
Sarri, A; Dias, F
2012-01-01
Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake). Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the Outer Product Emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the Leave-One-O...
Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification
Directory of Open Access Journals (Sweden)
A. Sarri
2012-06-01
Full Text Available Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake. Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romanach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using
Critical conceptualism in environmental modeling and prediction.
Christakos, G
2003-10-15
Many important problems in environmental science and engineering are of a conceptual nature. Research and development, however, often becomes so preoccupied with technical issues, which are themselves fascinating, that it neglects essential methodological elements of conceptual reasoning and theoretical inquiry. This work suggests that valuable insight into environmental modeling can be gained by means of critical conceptualism which focuses on the software of human reason and, in practical terms, leads to a powerful methodological framework of space-time modeling and prediction. A knowledge synthesis system develops the rational means for the epistemic integration of various physical knowledge bases relevant to the natural system of interest in order to obtain a realistic representation of the system, provide a rigorous assessment of the uncertainty sources, generate meaningful predictions of environmental processes in space-time, and produce science-based decisions. No restriction is imposed on the shape of the distribution model or the form of the predictor (non-Gaussian distributions, multiple-point statistics, and nonlinear models are automatically incorporated). The scientific reasoning structure underlying knowledge synthesis involves teleologic criteria and stochastic logic principles which have important advantages over the reasoning method of conventional space-time techniques. Insight is gained in terms of real world applications, including the following: the study of global ozone patterns in the atmosphere using data sets generated by instruments on board the Nimbus 7 satellite and secondary information in terms of total ozone-tropopause pressure models; the mapping of arsenic concentrations in the Bangladesh drinking water by assimilating hard and soft data from an extensive network of monitoring wells; and the dynamic imaging of probability distributions of pollutants across the Kalamazoo river.
Return Predictability, Model Uncertainty, and Robust Investment
DEFF Research Database (Denmark)
Lukas, Manuel
Stock return predictability is subject to great uncertainty. In this paper we use the model confidence set approach to quantify uncertainty about expected utility from investment, accounting for potential return predictability. For monthly US data and six representative return prediction models, we...
Statistical Inference of Biometrical Genetic Model With Cultural Transmission.
Guo, Xiaobo; Ji, Tian; Wang, Xueqin; Zhang, Heping; Zhong, Shouqiang
2013-01-01
Twin and family studies establish the foundation for studying the genetic, environmental and cultural transmission effects for phenotypes. In this work, we make use of the well established statistical methods and theory for mixed models to assess cultural transmission in twin and family studies. Specifically, we address two critical yet poorly understood issues: the model identifiability in assessing cultural transmission for twin and family data and the biases in the estimates when sub-models are used. We apply our models and theory to two real data sets. A simulation is conducted to verify the bias in the estimates of genetic effects when the working model is a sub-model.
Analyzing sickness absence with statistical models for survival data
DEFF Research Database (Denmark)
Christensen, Karl Bang; Andersen, Per Kragh; Smith-Hansen, Lars;
2007-01-01
absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...... between the psychosocial work environment and sickness absence were used to illustrate the results. RESULTS: Standard methods were found to underestimate true effect sizes by approximately one-tenth [method i] and one-third [method ii] and to have lower statistical power than frailty models. CONCLUSIONS...
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.