Unbalanced Regressions and the Predictive Equation
DEFF Research Database (Denmark)
Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo
Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...
Unbalanced Regressions and the Predictive Equation
DEFF Research Database (Denmark)
Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo
Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...
Regression equations to predict 6-minute walk distance in Chinese adults aged 55–85 years
Shirley P.C. Ngai, PhD; Alice Y.M. Jones, PhD; Sue C. Jenkins, PhD
2014-01-01
The 6-minute walk distance (6MWD) is used as a measure of functional exercise capacity in clinical populations and research. Reference equations to predict 6MWD in different populations have been established, however, available equations for Chinese population are scarce. This study aimed to develop regression equations to predict the 6MWD for a Hong Kong Chinese population. Fifty-three healthy individuals (25 men, 28 women; mean age = 69.3 ± 6.5 years) participated in this cross-sectional st...
Regression Equations for Birth Weight Estimation using ...
African Journals Online (AJOL)
In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...
Hierarchical regression analysis in structural Equation Modeling
de Jong, P.F.
1999-01-01
In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Directory of Open Access Journals (Sweden)
S V Soumya
2013-01-01
Conclusion: The observations obtained from our study would not only pave the way in predicting the mesiodistal width of unerupted canine and premolar in Chennai population but also give normative odontometric data which can be used for anthropological use and for diagnosis and treatment planning
Prediction, Regression and Critical Realism
DEFF Research Database (Denmark)
Næss, Petter
2004-01-01
This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...... seen as necessary in order to identify aggregate level effects of policy measures, but are questioned by many advocates of critical realist ontology. Using research into the relationship between urban structure and travel as an example, the paper discusses relevant research methods and the kinds...
ANTHROPOMETRIC PREDICTIVE EQUATIONS FOR ...
African Journals Online (AJOL)
Keywords: Anthropometry, Predictive Equations, Percentage Body Fat, Nigerian Women, Bioelectric Impedance ... such as Asians and Indians (Pranav et al., 2009), ... size (n) of at least 3o is adjudged as sufficient for the ..... of people, gender and age (Vogel eta/., 1984). .... Fish Sold at Ile-Ife Main Market, South West Nigeria.
Who Will Win?: Predicting the Presidential Election Using Linear Regression
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Predicting Social Trust with Binary Logistic Regression
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
SPE dose prediction using locally weighted regression
International Nuclear Information System (INIS)
Hines, J. W.; Townsend, L. W.; Nichols, T. F.
2005-01-01
When astronauts are outside earth's protective magnetosphere, they are subject to large radiation doses resulting from solar particle events (SPEs). The total dose received from a major SPE in deep space could cause severe radiation poisoning. The dose is usually received over a 20-40 h time interval but the event's effects may be mitigated with an early warning system. This paper presents a method to predict the total dose early in the event. It uses a locally weighted regression model, which is easier to train and provides predictions as accurate as neural network models previously used. (authors)
SPE dose prediction using locally weighted regression
International Nuclear Information System (INIS)
Hines, J. W.; Townsend, L. W.; Nichols, T. F.
2005-01-01
When astronauts are outside Earth's protective magnetosphere, they are subject to large radiation doses resulting from solar particle events. The total dose received from a major solar particle event in deep space could cause severe radiation poisoning. The dose is usually received over a 20-40 h time interval but the event's effects may be reduced with an early warning system. This paper presents a method to predict the total dose early in the event. It uses a locally weighted regression model, which is easier to train, and provides predictions as accurate as the neural network models that were used previously. (authors)
Predictive Temperature Equations for Three Sites at the Grand Canyon
McLaughlin, Katrina Marie Neitzel
Climate data collected at a number of automated weather stations were used to create a series of predictive equations spanning from December 2009 to May 2010 in order to better predict the temperatures along hiking trails within the Grand Canyon. The central focus of this project is how atmospheric variables interact and can be combined to predict the weather in the Grand Canyon at the Indian Gardens, Phantom Ranch, and Bright Angel sites. Through the use of statistical analysis software and data regression, predictive equations were determined. The predictive equations are simple or multivariable best fits that reflect the curvilinear nature of the data. With data analysis software curves resulting from the predictive equations were plotted along with the observed data. Each equation's reduced chi2 was determined to aid the visual examination of the predictive equations' ability to reproduce the observed data. From this information an equation or pair of equations was determined to be the best of the predictive equations. Although a best predictive equation for each month and season was determined for each site, future work may refine equations to result in a more accurate predictive equation.
DNBR Prediction Using a Support Vector Regression
International Nuclear Information System (INIS)
Yang, Heon Young; Na, Man Gyun
2008-01-01
PWRs (Pressurized Water Reactors) generally operate in the nucleate boiling state. However, the conversion of nucleate boiling into film boiling with conspicuously reduced heat transfer induces a boiling crisis that may cause the fuel clad melting in the long run. This type of boiling crisis is called Departure from Nucleate Boiling (DNB) phenomena. Because the prediction of minimum DNBR in a reactor core is very important to prevent the boiling crisis such as clad melting, a lot of research has been conducted to predict DNBR values. The object of this research is to predict minimum DNBR applying support vector regression (SVR) by using the measured signals of a reactor coolant system (RCS). The SVR has extensively and successfully been applied to nonlinear function approximation like the proposed problem for estimating DNBR values that will be a function of various input variables such as reactor power, reactor pressure, core mass flowrate, control rod positions and so on. The minimum DNBR in a reactor core is predicted using these various operating condition data as the inputs to the SVR. The minimum DBNR values predicted by the SVR confirm its correctness compared with COLSS values
Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.
2012-01-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…
BANK FAILURE PREDICTION WITH LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Taha Zaghdoudi
2013-04-01
Full Text Available In recent years the economic and financial world is shaken by a wave of financial crisis and resulted in violent bank fairly huge losses. Several authors have focused on the study of the crises in order to develop an early warning model. It is in the same path that our work takes its inspiration. Indeed, we have tried to develop a predictive model of Tunisian bank failures with the contribution of the binary logistic regression method. The specificity of our prediction model is that it takes into account microeconomic indicators of bank failures. The results obtained using our provisional model show that a bank's ability to repay its debt, the coefficient of banking operations, bank profitability per employee and leverage financial ratio has a negative impact on the probability of failure.
Keith, Timothy Z
2014-01-01
Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.
Gaussian process regression for tool wear prediction
Kong, Dongdong; Chen, Yongjie; Li, Ning
2018-05-01
To realize and accelerate the pace of intelligent manufacturing, this paper presents a novel tool wear assessment technique based on the integrated radial basis function based kernel principal component analysis (KPCA_IRBF) and Gaussian process regression (GPR) for real-timely and accurately monitoring the in-process tool wear parameters (flank wear width). The KPCA_IRBF is a kind of new nonlinear dimension-increment technique and firstly proposed for feature fusion. The tool wear predictive value and the corresponding confidence interval are both provided by utilizing the GPR model. Besides, GPR performs better than artificial neural networks (ANN) and support vector machines (SVM) in prediction accuracy since the Gaussian noises can be modeled quantitatively in the GPR model. However, the existence of noises will affect the stability of the confidence interval seriously. In this work, the proposed KPCA_IRBF technique helps to remove the noises and weaken its negative effects so as to make the confidence interval compressed greatly and more smoothed, which is conducive for monitoring the tool wear accurately. Moreover, the selection of kernel parameter in KPCA_IRBF can be easily carried out in a much larger selectable region in comparison with the conventional KPCA_RBF technique, which helps to improve the efficiency of model construction. Ten sets of cutting tests are conducted to validate the effectiveness of the presented tool wear assessment technique. The experimental results show that the in-process flank wear width of tool inserts can be monitored accurately by utilizing the presented tool wear assessment technique which is robust under a variety of cutting conditions. This study lays the foundation for tool wear monitoring in real industrial settings.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Linear regression crash prediction models : issues and proposed solutions.
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Dynamic prediction of cumulative incidence functions by direct binomial regression.
Grand, Mia K; de Witte, Theo J M; Putter, Hein
2018-03-25
In recent years there have been a series of advances in the field of dynamic prediction. Among those is the development of methods for dynamic prediction of the cumulative incidence function in a competing risk setting. These models enable the predictions to be updated as time progresses and more information becomes available, for example when a patient comes back for a follow-up visit after completing a year of treatment, the risk of death, and adverse events may have changed since treatment initiation. One approach to model the cumulative incidence function in competing risks is by direct binomial regression, where right censoring of the event times is handled by inverse probability of censoring weights. We extend the approach by combining it with landmarking to enable dynamic prediction of the cumulative incidence function. The proposed models are very flexible, as they allow the covariates to have complex time-varying effects, and we illustrate how to investigate possible time-varying structures using Wald tests. The models are fitted using generalized estimating equations. The method is applied to bone marrow transplant data and the performance is investigated in a simulation study. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Prediction Equations for Spirometry for Children from Northern India.
Chhabra, Sunil K; Kumar, Rajeev; Mittal, Vikas
2016-09-08
To develop prediction equations for spirometry for children from northern India using current international guidelines for standardization. Re-analysis of cross-sectional data from a single school. 670 normal children (age 6-17 y; 365 boys) of northern Indian parentage. After screening for normal health, we carried out spirometry with recommended quality assurance according to current guidelines. We developed linear and nonlinear prediction equations using multiple regression analysis. We selected the final models on the basis of the highest coefficient of multiple determination (R2) and statistical validity. Spirometry parameters: FVC, FEV1, PEFR, FEF50, FEF75 and FEF25-75. The equations for the main parameters were as follows: Boys, Ln FVC = -1.687+0.016*height +0.022*age; Ln FEV1 = -1.748+0.015*height+0.031*age. Girls, Ln FVC = -9.989 +(2.018*Ln(height)) + (0.324*Ln(age)); Ln FEV1 = -10.055 +(1.990*Ln(height))+(0.358*Ln(age)). Nonlinear regression yielded substantially greater R2 values compared to linear models except for FEF50 for girls. Height and age were found to be the significant explanatory variables for all parameters on multiple regression with weight making no significant contribution. We developed prediction equations for spirometry for children from northern India. Nonlinear equations were superior to linear equations.
Corporate prediction models, ratios or regression analysis?
Bijnen, E.J.; Wijn, M.F.C.M.
1994-01-01
The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in
Evaluation of peak power prediction equations in male basketball players.
Duncan, Michael J; Lyons, Mark; Nevill, Alan M
2008-07-01
This study compared peak power estimated using 4 commonly used regression equations with actual peak power derived from force platform data in a group of adolescent basketball players. Twenty-five elite junior male basketball players (age, 16.5 +/- 0.5 years; mass, 74.2 +/- 11.8 kg; height, 181.8 +/- 8.1 cm) volunteered to participate in the study. Actual peak power was determined using a countermovement vertical jump on a force platform. Estimated peak power was determined using countermovement jump height and body mass. All 4 prediction equations were significantly related to actual peak power (all p jump prediction equations, 12% for the Canavan and Vescovi equation, and 6% for the Sayers countermovement jump equation. In all cases peak power was underestimated.
Gaussian Process Regression for WDM System Performance Prediction
DEFF Research Database (Denmark)
Wass, Jesper; Thrane, Jakob; Piels, Molly
2017-01-01
Gaussian process regression is numerically and experimentally investigated to predict the bit error rate of a 24 x 28 CiBd QPSK WDM system. The proposed method produces accurate predictions from multi-dimensional and sparse measurement data.......Gaussian process regression is numerically and experimentally investigated to predict the bit error rate of a 24 x 28 CiBd QPSK WDM system. The proposed method produces accurate predictions from multi-dimensional and sparse measurement data....
Height - Diameter predictive equations for Rubber (Hevea ...
African Journals Online (AJOL)
BUKOLA
They proffer logistic data for modeling and futuristic prediction for sustainable forest management. Diameter is one of the most ... in various quantitative estimation following the intricacy of time, availability of modern equipments .... growth functions. This trend is shown in Figure 1 where the prediction equations are plotted.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Sunspot Cycle Prediction Using Multivariate Regression and Binary ...
Indian Academy of Sciences (India)
49
Multivariate regression model has been derived based on the available cycles 1 .... The flare index correlates well with various parameters of the solar activity. ...... 32) Sabarinath A and Anilkumar A K 2011 A stochastic prediction model for the.
Predicting Word Reading Ability: A Quantile Regression Study
McIlraith, Autumn L.
2018-01-01
Predictors of early word reading are well established. However, it is unclear if these predictors hold for readers across a range of word reading abilities. This study used quantile regression to investigate predictive relationships at different points in the distribution of word reading. Quantile regression analyses used preschool and…
Regression formulae for predicting hematologic and liver functions ...
African Journals Online (AJOL)
African Journal of Biomedical Research ... On the other hand platelet and white blood cell (WBC) counts in these workers correlated positively with years of service [r = 0.342 (P <0.001) and r = 0.130 (P<0.0001) ... The regression equation defining this relationship is: ALP concentration = 33.68 – 0.075 x years of service.
Modeling and Prediction Using Stochastic Differential Equations
DEFF Research Database (Denmark)
Juhl, Rune; Møller, Jan Kloppenborg; Jørgensen, John Bagterp
2016-01-01
Pharmacokinetic/pharmakodynamic (PK/PD) modeling for a single subject is most often performed using nonlinear models based on deterministic ordinary differential equations (ODEs), and the variation between subjects in a population of subjects is described using a population (mixed effects) setup...... deterministic and can predict the future perfectly. A more realistic approach would be to allow for randomness in the model due to e.g., the model be too simple or errors in input. We describe a modeling and prediction setup which better reflects reality and suggests stochastic differential equations (SDEs...
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Sintering equation: determination of its coefficients by experiments - using multiple regression
International Nuclear Information System (INIS)
Windelberg, D.
1999-01-01
Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)
Fault trend prediction of device based on support vector regression
International Nuclear Information System (INIS)
Song Meicun; Cai Qi
2011-01-01
The research condition of fault trend prediction and the basic theory of support vector regression (SVR) were introduced. SVR was applied to the fault trend prediction of roller bearing, and compared with other methods (BP neural network, gray model, and gray-AR model). The results show that BP network tends to overlearn and gets into local minimum so that the predictive result is unstable. It also shows that the predictive result of SVR is stabilization, and SVR is superior to BP neural network, gray model and gray-AR model in predictive precision. SVR is a kind of effective method of fault trend prediction. (authors)
Wind speed prediction using statistical regression and neural network
Indian Academy of Sciences (India)
Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve ﬁtting,Auto Regressive ...
Smith, Paul F; Ganesh, Siva; Liu, Ping
2013-10-30
Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
International Nuclear Information System (INIS)
Jafri, Y.Z.; Kamal, L.
2007-01-01
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
A regression approach for Zircaloy-2 in-reactor creep constitutive equations
International Nuclear Information System (INIS)
Yung Liu, Y.; Bement, A.L.
1977-01-01
In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)
Real estate value prediction using multivariate regression models
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Tax Evasion, Information Reporting, and the Regressive Bias Prediction
DEFF Research Database (Denmark)
Boserup, Simon Halphen; Pinje, Jori Veng
2013-01-01
evasion and audit probabilities once we account for information reporting in the tax compliance game. When conditioning on information reporting, we find that both reduced-form evidence and simulations exhibit the predicted regressive bias. However, in the overall economy, this bias is negated by the tax......Models of rational tax evasion and optimal enforcement invariably predict a regressive bias in the effective tax system, which reduces redistribution in the economy. Using Danish administrative data, we show that a calibrated structural model of this type replicates moments and correlations of tax...
Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph
2016-03-01
The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Directory of Open Access Journals (Sweden)
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...
Predicting Dropouts of University Freshmen: A Logit Regression Analysis.
Lam, Y. L. Jack
1984-01-01
Stepwise discriminant analysis coupled with logit regression analysis of freshmen data from Brandon University (Manitoba) indicated that six tested variables drawn from research on university dropouts were useful in predicting attrition: student status, residence, financial sources, distance from home town, goal fulfillment, and satisfaction with…
Prediction of Concrete Mix Cost Using Modified Regression Theory ...
African Journals Online (AJOL)
The cost of concrete production which largely depends on the cost of the constituent materials, affects the overall cost of construction. In this paper, a model based on modified regression theory is formulated to optimise concrete mix cost (in Naira). Using the model, one can predict the cost per cubic meter of concrete if the ...
On the estimation and testing of predictive panel regressions
Karabiyik, H.; Westerlund, Joakim; Narayan, Paresh
2016-01-01
Hjalmarsson (2010) considers an OLS-based estimator of predictive panel regressions that is argued to be mixed normal under very general conditions. In a recent paper, Westerlund et al. (2016) show that while consistent, the estimator is generally not mixed normal, which invalidates standard normal
Ding, A Adam; Wu, Hulin
2014-10-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.
A regression approach for zircaloy-2 in-reactor creep constitutive equations
International Nuclear Information System (INIS)
Yung Liu, Y.; Bement, A.L.
1977-01-01
In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections
International Nuclear Information System (INIS)
Mamikonyan, S.V.; Berezkin, V.V.; Lyubimova, S.V.; Svetajlo, Yu.N.; Shchekin, K.I.
1978-01-01
A method to derive multiple regression equations for X-ray radiometric analysis is described. Te method is realized in the form of the REGRA program in an algorithmic language. The subprograms included in the program are describe. In analyzing cement for Mg, Al, Si, Ca and Fe contents as an example, the obtainment of working equations in the course of calculations by the program is shown to simpliy the realization of computing devices in instruments for X-ray radiometric analysis
Directory of Open Access Journals (Sweden)
Sangkertadi Sangkertadi
2012-05-01
Full Text Available This study is about field experimentation in order to construct regression equations of perception of thermalcomfort for outdoor activities under hot and humid environment. Relationships between thermal-comfort perceptions, micro climate variables (temperatures and humidity and body parameters (activity, clothing, body measure have been observed and analyzed. 180 adults, men, and women participated as samples/respondents. This study is limited for situation where wind velocity is about 1 m/s, which touch the body of the respondents/samples. From questionnaires and field measurements, three regression equations have been developed, each for activity of normal walking, brisk walking, and sitting.
Modeling a Predictive Energy Equation Specific for Maintenance Hemodialysis.
Byham-Gray, Laura D; Parrott, J Scott; Peters, Emily N; Fogerite, Susan Gould; Hand, Rosa K; Ahrens, Sean; Marcus, Andrea Fleisch; Fiutem, Justin J
2017-03-01
Hypermetabolism is theorized in patients diagnosed with chronic kidney disease who are receiving maintenance hemodialysis (MHD). We aimed to distinguish key disease-specific determinants of resting energy expenditure to create a predictive energy equation that more precisely establishes energy needs with the intent of preventing protein-energy wasting. For this 3-year multisite cross-sectional study (N = 116), eligible participants were diagnosed with chronic kidney disease and were receiving MHD for at least 3 months. Predictors for the model included weight, sex, age, C-reactive protein (CRP), glycosylated hemoglobin, and serum creatinine. The outcome variable was measured resting energy expenditure (mREE). Regression modeling was used to generate predictive formulas and Bland-Altman analyses to evaluate accuracy. The majority were male (60.3%), black (81.0%), and non-Hispanic (76.7%), and 23% were ≥65 years old. After screening for multicollinearity, the best predictive model of mREE ( R 2 = 0.67) included weight, age, sex, and CRP. Two alternative models with acceptable predictability ( R 2 = 0.66) were derived with glycosylated hemoglobin or serum creatinine. Based on Bland-Altman analyses, the maintenance hemodialysis equation that included CRP had the best precision, with the highest proportion of participants' predicted energy expenditure classified as accurate (61.2%) and with the lowest number of individuals with underestimation or overestimation. This study confirms disease-specific factors as key determinants of mREE in patients on MHD and provides a preliminary predictive energy equation. Further prospective research is necessary to test the reliability and validity of this equation across diverse populations of patients who are receiving MHD.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Modeling and prediction of flotation performance using support vector regression
Directory of Open Access Journals (Sweden)
Despotović Vladimir
2017-01-01
Full Text Available Continuous efforts have been made in recent year to improve the process of paper recycling, as it is of critical importance for saving the wood, water and energy resources. Flotation deinking is considered to be one of the key methods for separation of ink particles from the cellulose fibres. Attempts to model the flotation deinking process have often resulted in complex models that are difficult to implement and use. In this paper a model for prediction of flotation performance based on Support Vector Regression (SVR, is presented. Representative data samples were created in laboratory, under a variety of practical control variables for the flotation deinking process, including different reagents, pH values and flotation residence time. Predictive model was created that was trained on these data samples, and the flotation performance was assessed showing that Support Vector Regression is a promising method even when dataset used for training the model is limited.
Validation of the mortality prediction equation for damage control ...
African Journals Online (AJOL)
, preoperative lowest pH and lowest core body temperature to derive an equation for the purpose of predicting mortality in damage control surgery. It was shown to reliably predict death despite damage control surgery. The equation derivation ...
Predicting Performance on MOOC Assessments using Multi-Regression Models
Ren, Zhiyun; Rangwala, Huzefa; Johri, Aditya
2016-01-01
The past few years has seen the rapid growth of data min- ing approaches for the analysis of data obtained from Mas- sive Open Online Courses (MOOCs). The objectives of this study are to develop approaches to predict the scores a stu- dent may achieve on a given grade-related assessment based on information, considered as prior performance or prior ac- tivity in the course. We develop a personalized linear mul- tiple regression (PLMR) model to predict the grade for a student, prior to attempt...
Directory of Open Access Journals (Sweden)
Jichul Ryu
2016-04-01
Full Text Available In this study, 52 asymptotic Curve Number (CN regression equations were developed for combinations of representative land covers and hydrologic soil groups. In addition, to overcome the limitations of the original Long-term Hydrologic Impact Assessment (L-THIA model when it is applied to larger watersheds, a watershed-scale L-THIA Asymptotic CN (ACN regression equation model (watershed-scale L-THIA ACN model was developed by integrating the asymptotic CN regressions and various modules for direct runoff/baseflow/channel routing. The watershed-scale L-THIA ACN model was applied to four watersheds in South Korea to evaluate the accuracy of its streamflow prediction. The coefficient of determination (R2 and Nash–Sutcliffe Efficiency (NSE values for observed versus simulated streamflows over intervals of eight days were greater than 0.6 for all four of the watersheds. The watershed-scale L-THIA ACN model, including the asymptotic CN regression equation method, can simulate long-term streamflow sufficiently well with the ten parameters that have been added for the characterization of streamflow.
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Intelligent Quality Prediction Using Weighted Least Square Support Vector Regression
Yu, Yaojun
A novel quality prediction method with mobile time window is proposed for small-batch producing process based on weighted least squares support vector regression (LS-SVR). The design steps and learning algorithm are also addressed. In the method, weighted LS-SVR is taken as the intelligent kernel, with which the small-batch learning is solved well and the nearer sample is set a larger weight, while the farther is set the smaller weight in the history data. A typical machining process of cutting bearing outer race is carried out and the real measured data are used to contrast experiment. The experimental results demonstrate that the prediction accuracy of the weighted LS-SVR based model is only 20%-30% that of the standard LS-SVR based one in the same condition. It provides a better candidate for quality prediction of small-batch producing process.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Directory of Open Access Journals (Sweden)
Minh Vu Trieu
2017-03-01
Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
DEFF Research Database (Denmark)
D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar
2012-01-01
and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...... flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (PGender, thigh mass and age best...... predicted KPT (R2=0.60). Gender, forearm mass and age best predicted EPT (R2=0.75). Good crossvalidation was established for both elbow and knee models. Conclusion: This cross-sectional study of muscle strength created and validated strength scaled equations of EPT and KPT using only gender, segment mass...
Hierarchical Neural Regression Models for Customer Churn Prediction
Directory of Open Access Journals (Sweden)
Golshan Mohammadi
2013-01-01
Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Prediction equations for spirometry in adults from northern India.
Chhabra, S K; Kumar, R; Gupta, U; Rahman, M; Dash, D J
2014-01-01
Most of the Indian studies on prediction equations for spirometry in adults are several decades old and may have lost their utility as these were carried out with equipment and standardisation protocols that have since changed. Their validity is further questionable as the lung health of the population is likely to have changed over time. To develop prediction equations for spirometry in adults of north Indian origin using the 2005 American Thoracic Society/European Respiratory Society (ATS/ERS) recommendations on standardisation. Normal healthy non-smoker subjects, both males and females, aged 18 years and above underwent spirometry using a non-heated Fleisch Pneumotach spirometer calibrated daily. The dataset was randomly divided into training (70%) and test (30%) sets and the former was used to develop the equations. These were validated on the test data set. Prediction equations were developed separately for males and females for forced vital capacity (FVC), forced expiratory volume in first second (FEV1), FEV1/FVC ratio, and instantaneous expiratory flow rates using multiple linear regression procedure with different transformations of dependent and/or independent variables to achieve the best-fitting models for the data. The equations were compared with the previous ones developed in the same population in the 1960s. In all, 685 (489 males, 196 females) subjects performed spirometry that was technically acceptable and repeatable. All the spirometry parameters were significantly higher among males except the FEV1/FVC ratio that was significantly higher in females. Overall, age had a negative relationship with the spirometry parameters while height was positively correlated with each, except for the FEV1/FVC ratio that was related only to age. Weight was included in the models for FVC, forced expiratory flow (FEF75) and FEV1/FVC ratio in males, but its contribution was very small. Standard errors of estimate were provided to enable calculation of the lower
Predicting company growth using logistic regression and neural networks
Directory of Open Access Journals (Sweden)
Marijana Zekić-Sušac
2016-12-01
Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
Regression Model to Predict Global Solar Irradiance in Malaysia
Directory of Open Access Journals (Sweden)
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Expanded prediction equations of human sweat loss and water needs.
Gonzalez, R R; Cheuvront, S N; Montain, S J; Goodman, D A; Blanchard, L A; Berglund, L G; Sawka, M N
2009-08-01
The Institute of Medicine expressed a need for improved sweating rate (msw) prediction models that calculate hourly and daily water needs based on metabolic rate, clothing, and environment. More than 25 years ago, the original Shapiro prediction equation (OSE) was formulated as msw (g.m(-2).h(-1))=27.9.Ereq.(Emax)(-0.455), where Ereq is required evaporative heat loss and Emax is maximum evaporative power of the environment; OSE was developed for a limited set of environments, exposures times, and clothing systems. Recent evidence shows that OSE often overpredicts fluid needs. Our study developed a corrected OSE and a new msw prediction equation by using independent data sets from a wide range of environmental conditions, metabolic rates (rest to losses were carefully measured in 101 volunteers (80 males and 21 females; >500 observations) by using a variety of metabolic rates over a range of environmental conditions (ambient temperature, 15-46 degrees C; water vapor pressure, 0.27-4.45 kPa; wind speed, 0.4-2.5 m/s), clothing, and equipment combinations and durations (2-8 h). Data are expressed as grams per square meter per hour and were analyzed using fuzzy piecewise regression. OSE overpredicted sweating rates (Pdata (21 males and 9 females; >200 observations). OSEC and PW were more accurate predictors of sweating rate (58 and 65% more accurate, Perror (standard error estimate<100 g.m(-2).h(-1)) for conditions both within and outside the original OSE domain of validity. The new equations provide for more accurate sweat predictions over a broader range of conditions with applications to public health, military, occupational, and sports medicine settings.
Estimation of monthly solar exposure on horizontal surface by Angstrom-type regression equation
International Nuclear Information System (INIS)
Ravanshid, S.H.
1981-01-01
To obtain solar flux intensity, solar radiation measuring instruments are the best. In the absence of instrumental data there are other meteorological measurements which are related to solar energy and also it is possible to use empirical relationships to estimate solar flux intensit. One of these empirical relationships to estimate monthly averages of total solar radiation on a horizontal surface is the modified angstrom-type regression equation which has been employed in this report in order to estimate the solar flux intensity on a horizontal surface for Tehran. By comparing the results of this equation with four years measured valued by Tehran's meteorological weather station the values of meteorological constants (a,b) in the equation were obtained for Tehran. (author)
Rock, N. M. S.; Duffy, T. R.
REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.
Revised predictive equations for salt intrusion modelling in estuaries
Gisen, J.I.A.; Savenije, H.H.G.; Nijzink, R.C.
2015-01-01
For one-dimensional salt intrusion models to be predictive, we need predictive equations to link model parameters to observable hydraulic and geometric variables. The one-dimensional model of Savenije (1993b) made use of predictive equations for the Van der Burgh coefficient $K$ and the dispersion
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Is adult gait less susceptible than paediatric gait to hip joint centre regression equation error?
Kiernan, D; Hosking, J; O'Brien, T
2016-03-01
Hip joint centre (HJC) regression equation error during paediatric gait has recently been shown to have clinical significance. In relation to adult gait, it has been inferred that comparable errors with children in absolute HJC position may in fact result in less significant kinematic and kinetic error. This study investigated the clinical agreement of three commonly used regression equation sets (Bell et al., Davis et al. and Orthotrak) for adult subjects against the equations of Harrington et al. The relationship between HJC position error and subject size was also investigated for the Davis et al. set. Full 3-dimensional gait analysis was performed on 12 healthy adult subjects with data for each set compared to Harrington et al. The Gait Profile Score, Gait Variable Score and GDI-kinetic were used to assess clinical significance while differences in HJC position between the Davis and Harrington sets were compared to leg length and subject height using regression analysis. A number of statistically significant differences were present in absolute HJC position. However, all sets fell below the clinically significant thresholds (GPS <1.6°, GDI-Kinetic <3.6 points). Linear regression revealed a statistically significant relationship for both increasing leg length and increasing subject height with decreasing error in anterior/posterior and superior/inferior directions. Results confirm a negligible clinical error for adult subjects suggesting that any of the examined sets could be used interchangeably. Decreasing error with both increasing leg length and increasing subject height suggests that the Davis set should be used cautiously on smaller subjects. Copyright © 2016 Elsevier B.V. All rights reserved.
Validity of one-repetition maximum predictive equations in men with spinal cord injury.
Ribeiro Neto, F; Guanais, P; Dornelas, E; Coutinho, A C B; Costa, R R G
2017-10-01
Cross-sectional study. The study aimed (a) to test the cross-validation of current one-repetition maximum (1RM) predictive equations in men with spinal cord injury (SCI); (b) to compare the current 1RM predictive equations to a newly developed equation based on the 4- to 12-repetition maximum test (4-12RM). SARAH Rehabilitation Hospital Network, Brasilia, Brazil. Forty-five men aged 28.0 years with SCI between C6 and L2 causing complete motor impairment were enrolled in the study. Volunteers were tested, in a random order, in 1RM test or 4-12RM with 2-3 interval days. Multiple regression analysis was used to generate an equation for predicting 1RM. There were no significant differences between 1RM test and the current predictive equations. ICC values were significant and were classified as excellent for all current predictive equations. The predictive equation of Lombardi presented the best Bland-Altman results (0.5 kg and 12.8 kg for mean difference and interval range around the differences, respectively). The two created equation models for 1RM demonstrated the same and a high adjusted R 2 (0.971, Ppredictive equations are accurate to assess individuals with SCI at the bench press exercise. However, the predictive equation of Lombardi presented the best associated cross-validity results. A specific 1RM prediction equation was also elaborated for individuals with SCI. The created equation should be tested in order to verify whether it presents better accuracy than the current ones.
Evaluation of abutment scour prediction equations with field data
Benedict, S.T.; Deshpande, N.; Aziz, N.M.
2007-01-01
The U.S. Geological Survey, in cooperation with FHWA, compared predicted abutment scour depths, computed with selected predictive equations, with field observations collected at 144 bridges in South Carolina and at eight bridges from the National Bridge Scour Database. Predictive equations published in the 4th edition of Evaluating Scour at Bridges (Hydraulic Engineering Circular 18) were used in this comparison, including the original Froehlich, the modified Froehlich, the Sturm, the Maryland, and the HIRE equations. The comparisons showed that most equations tended to provide conservative estimates of scour that at times were excessive (as large as 158 ft). Equations also produced underpredictions of scour, but with less frequency. Although the equations provide an important resource for evaluating abutment scour at bridges, the results of this investigation show the importance of using engineering judgment in conjunction with these equations.
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.
Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana
2012-02-01
In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra
2010-01-01
In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.
Quirk, Meghan E.; Schmotzer, Brian J.; Schmotzer, Brian J.; Singh, Rani H.
2010-01-01
Resting energy expenditure (REE) is often used to estimate total energy needs. The Schofield equation based on weight and height has been reported to underestimate REE in female children with phenylketonuria (PKU). The objective of this observational, cross-sectional study was to evaluate the agreement of measured REE with predicted REE for female adolescents with PKU. A total of 36 females (aged 11.5-18.7 years) with PKU attending Emory University’s Metabolic Camp (June 2002 – June 2008) underwent indirect calorimetry. Measured REE was compared to six predictive equations using paired Student’s t-tests, regression-based analysis, and assessment of clinical accuracy. The differences between measured and predicted REE were modeled against clinical parameters to determine to if a relationship existed. All six selected equations significantly under predicted measured REE (P< 0.005). The Schofield equation based on weight had the greatest level of agreement, with the lowest mean prediction bias (144 kcal) and highest concordance correlation coefficient (0.626). However, the Schofield equation based on weight lacked clinical accuracy, predicting measured REE within ±10% in only 14 of 36 participants. Clinical parameters were not associated with bias for any of the equations. Predictive equations underestimated measured REE in this group of female adolescents with PKU. Currently, there is no accurate and precise alternative for indirect calorimetry in this population. PMID:20497783
Prediction accuracy and stability of regression with optimal scaling transformations
Kooij, van der Anita J.
2007-01-01
The central topic of this thesis is the CATREG approach to nonlinear regression. This approach finds optimal quantifications for categorical variables and/or nonlinear transformations for numerical variables in regression analysis. (CATREG is implemented in SPSS Categories by the author of the
Prediction equations of forced oscillation technique: the insidious role of collinearity.
Narchi, Hassib; AlBlooshi, Afaf
2018-03-27
Many studies have reported reference data for forced oscillation technique (FOT) in healthy children. The prediction equation of FOT parameters were derived from a multivariable regression model examining the effect of age, gender, weight and height on each parameter. As many of these variables are likely to be correlated, collinearity might have affected the accuracy of the model, potentially resulting in misleading, erroneous or difficult to interpret conclusions.The aim of this work was: To review all FOT publications in children since 2005 to analyze whether collinearity was considered in the construction of the published prediction equations. Then to compare these prediction equations with our own study. And to analyse, in our study, how collinearity between the explanatory variables might affect the predicted equations if it was not considered in the model. The results showed that none of the ten reviewed studies had stated whether collinearity was checked for. Half of the reports had also included in their equations variables which are physiologically correlated, such as age, weight and height. The predicted resistance varied by up to 28% amongst these studies. And in our study, multicollinearity was identified between the explanatory variables initially considered for the regression model (age, weight and height). Ignoring it would have resulted in inaccuracies in the coefficients of the equation, their signs (positive or negative), their 95% confidence intervals, their significance level and the model goodness of fit. In Conclusion with inaccurately constructed and improperly reported models, understanding the results and reproducing the models for future research might be compromised.
Equations for predicting biomass of six introduced tree species, island of Hawaii
Thomas H. Schukrt; Robert F. Strand; Thomas G. Cole; Katharine E. McDuffie
1988-01-01
Regression equations to predict total and stem-only above-ground dry biomass for six species (Acacia melanoxylon, Albizio falcataria, Eucalyptus globulus, E. grandis, E. robusta, and E. urophylla) were developed by felling and measuring 2- to 6-year-old...
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Biomass estimates of freshwater zooplankton from length-carbon regression equations
Directory of Open Access Journals (Sweden)
Patrizia COMOLI
2000-02-01
Full Text Available We present length/carbon regression equations of zooplankton species collected from Lake Maggiore (N. Italy during 1992. The results are discussed in terms of the environmental factors, e.g. food availability, predation, controlling biomass production of particle- feeders and predators in the pelagic system of lakes. The marked seasonality in the length-standardized carbon content of Daphnia, and its time-specific trend suggest that from spring onward food availability for Daphnia population may be regarded as a simple decay function. Seasonality does not affect the carbon content/unit length of the two predator Cladocera Leptodora kindtii and Bythotrephes longimanus. Predation is probably the most important regulating factor for the seasonal dynamics of their carbon biomass. The existence of a constant factor to convert the diameter of Conochilus colonies into carbon seems reasonable for an organism whose population comes on quickly and just as quickly disappears.
Ground Motion Prediction Equations Empowered by Stress Drop Measurement
Miyake, H.; Oth, A.
2015-12-01
Significant variation of stress drop is a crucial issue for ground motion prediction equations and probabilistic seismic hazard assessment, since only a few ground motion prediction equations take into account stress drop. In addition to average and sigma studies of stress drop and ground motion prediction equations (e.g., Cotton et al., 2013; Baltay and Hanks, 2014), we explore 1-to-1 relationship for each earthquake between stress drop and between-event residual of a ground motion prediction equation. We used the stress drop dataset of Oth (2013) for Japanese crustal earthquakes ranging 0.1 to 100 MPa and K-NET/KiK-net ground motion dataset against for several ground motion prediction equations with volcanic front treatment. Between-event residuals for ground accelerations and velocities are generally coincident with stress drop, as investigated by seismic intensity measures of Oth et al. (2015). Moreover, we found faster attenuation of ground acceleration and velocities for large stress drop events for the similar fault distance range and focal depth. It may suggest an alternative parameterization of stress drop to control attenuation distance rate for ground motion prediction equations. We also investigate 1-to-1 relationship and sigma for regional/national-scale stress drop variation and current national-scale ground motion equations.
Shield Optimization and Formulation of Regression Equations for Split-Ring Resonator
Directory of Open Access Journals (Sweden)
Tahir Ejaz
2016-01-01
Full Text Available Microwave resonators are widely used for numerous applications including communication, biomedical and chemical applications, material testing, and food grading. Split-ring resonators in both planar and nonplanar forms are a simple structure which has been in use for several decades. This type of resonator is characterized with low cost, ease of fabrication, moderate quality factor, low external noise interference, high stability, and so forth. Due to these attractive features and ease in handling, nonplanar form of structure has been utilized for material characterization in 1–5 GHz range. Resonant frequency and quality factor are two important parameters for determination of material properties utilizing perturbation theory. Shield made of conducting material is utilized to enclose split-ring resonator which enhances quality factor. This work presents a novel technique to develop shield around a predesigned nonplanar split-ring resonator to yield optimized quality factor. Based on this technique and statistical analysis regression equations have also been formulated for resonant frequency and quality factor which is a major outcome of this work. These equations quantify dependence of output parameters on various factors of shield made of different materials. Such analysis is instrumental in development of devices/designs where improved/optimum result is required.
Predicting logging residues: an interim equation for Appalachian oak sawtimber
A. Jeff Martin
1975-01-01
An equation, using dbh, dbh², bole length, and sawlog height to predict the cubic-foot volume of logging residue per tree, was developed from data collected on 36 mixed oaks in southwestern Virginia. The equation produced reliable results for small sawtimber trees, but additional research is needed for other species, sites, and utilization practices.
Predicting Career Advancement with Structural Equation Modelling
Heimler, Ronald; Rosenberg, Stuart; Morote, Elsa-Sofia
2012-01-01
Purpose: The purpose of this paper is to use the authors' prior findings concerning basic employability skills in order to determine which skills best predict career advancement potential. Design/methodology/approach: Utilizing survey responses of human resource managers, the employability skills showing the largest relationships to career…
International Nuclear Information System (INIS)
Janssen, I.; Stebbings, J.H.
1990-01-01
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
Prediction of diffuse solar irradiance using machine learning and multivariable regression
International Nuclear Information System (INIS)
Lou, Siwei; Li, Danny H.W.; Lam, Joseph C.; Chan, Wilco W.H.
2016-01-01
Highlights: • 54.9% of the annual global irradiance is composed by its diffuse part in HK. • Hourly diffuse irradiance was predicted by accessible variables. • The importance of variable in prediction was assessed by machine learning. • Simple prediction equations were developed with the knowledge of variable importance. - Abstract: The paper studies the horizontal global, direct-beam and sky-diffuse solar irradiance data measured in Hong Kong from 2008 to 2013. A machine learning algorithm was employed to predict the horizontal sky-diffuse irradiance and conduct sensitivity analysis for the meteorological variables. Apart from the clearness index (horizontal global/extra atmospheric solar irradiance), we found that predictors including solar altitude, air temperature, cloud cover and visibility are also important in predicting the diffuse component. The mean absolute error (MAE) of the logistic regression using the aforementioned predictors was less than 21.5 W/m"2 and 30 W/m"2 for Hong Kong and Denver, USA, respectively. With the systematic recording of the five variables for more than 35 years, the proposed model would be appropriate to estimate of long-term diffuse solar radiation, study climate change and develope typical meteorological year in Hong Kong and places with similar climates.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
Evaluation of random forest regression for prediction of breeding ...
Indian Academy of Sciences (India)
have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, appli- ... tolerance to biotic and abiotic stresses. But due to ..... School, IARI, New Delhi, during his Ph.D. References.
Wanvarie, Samkaew; Sathapatayavongs, Boonmee
2007-09-01
The aim of this paper was to assess factors that predict students' performance in the Medical Licensing Examination of Thailand (MLET) Step1 examination. The hypothesis was that demographic factors and academic records would predict the students' performance in the Step1 Licensing Examination. A logistic regression analysis of demographic factors (age, sex and residence) and academic records [high school grade point average (GPA), National University Entrance Examination Score and GPAs of the pre-clinical years] with the MLET Step1 outcome was accomplished using the data of 117 third-year Ramathibodi medical students. Twenty-three (19.7%) students failed the MLET Step1 examination. Stepwise logistic regression analysis showed that the significant predictors of MLET Step1 success/failure were residence background and GPAs of the second and third preclinical years. For students whose sophomore and third-year GPAs increased by an average of 1 point, the odds of passing the MLET Step1 examination increased by a factor of 16.3 and 12.8 respectively. The minimum GPAs for students from urban and rural backgrounds to pass the examination were estimated from the equation (2.35 vs 2.65 from 4.00 scale). Students from rural backgrounds and/or low-grade point averages in their second and third preclinical years of medical school are at risk of failing the MLET Step1 examination. They should be given intensive tutorials during the second and third pre-clinical years.
J.B. St. Clair
1993-01-01
Logarithmic regression equations were developed to predict component biomass and leaf area for an 18-yr-old genetic test of Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco var. menziesii) based on stem diameter or cross-sectional sapwood area. Equations did not differ among open-pollinated families in slope, but intercepts...
Reservoir rock permeability prediction using support vector regression in an Iranian oil field
International Nuclear Information System (INIS)
Saffarzadeh, Sadegh; Shadizadeh, Seyed Reza
2012-01-01
Reservoir permeability is a critical parameter for the evaluation of hydrocarbon reservoirs. It is often measured in the laboratory from reservoir core samples or evaluated from well test data. The prediction of reservoir rock permeability utilizing well log data is important because the core analysis and well test data are usually only available from a few wells in a field and have high coring and laboratory analysis costs. Since most wells are logged, the common practice is to estimate permeability from logs using correlation equations developed from limited core data; however, these correlation formulae are not universally applicable. Recently, support vector machines (SVMs) have been proposed as a new intelligence technique for both regression and classification tasks. The theory has a strong mathematical foundation for dependence estimation and predictive learning from finite data sets. The ultimate test for any technique that bears the claim of permeability prediction from well log data is the accurate and verifiable prediction of permeability for wells where only the well log data are available. The main goal of this paper is to develop the SVM method to obtain reservoir rock permeability based on well log data. (paper)
Development of 1RM Prediction Equations for Bench Press in Moderately Trained Men.
Macht, Jordan W; Abel, Mark G; Mullineaux, David R; Yates, James W
2016-10-01
Macht, JW, Abel, MG, Mullineaux, DR, and Yates, JW. Development of 1RM prediction equations for bench press in moderately trained men. J Strength Cond Res 30(10): 2901-2906, 2016-There are a variety of established 1 repetition maximum (1RM) prediction equations, however, very few prediction equations use anthropometric characteristics exclusively or in part, to estimate 1RM strength. Therefore, the purpose of this study was to develop an original 1RM prediction equation for bench press using anthropometric and performance characteristics in moderately trained male subjects. Sixty male subjects (21.2 ± 2.4 years) completed a 1RM bench press and were randomly assigned a load to complete as many repetitions as possible. In addition, body composition, upper-body anthropometric characteristics, and handgrip strength were assessed. Regression analysis was used to develop a performance-based 1RM prediction equation: 1RM = 1.20 repetition weight + 2.19 repetitions to fatigue - 0.56 biacromial width (cm) + 9.6 (R = 0.99, standard error of estimate [SEE] = 3.5 kg). Regression analysis to develop a nonperformance-based 1RM prediction equation yielded: 1RM (kg) = 0.997 cross-sectional area (CSA) (cm) + 0.401 chest circumference (cm) - 0.385%fat - 0.185 arm length (cm) + 36.7 (R = 0.81, SEE = 13.0 kg). The performance prediction equations developed in this study had high validity coefficients, minimal mean bias, and small limits of agreement. The anthropometric equations had moderately high validity coefficient but larger limits of agreement. The practical applications of this study indicate that the inclusion of anthropometric characteristics and performance variables produce a valid prediction equation for 1RM strength. In addition, the CSA of the arm uses a simple nonperformance method of estimating the lifter's 1RM. This information may be used to predict the starting load for a lifter performing a 1RM prediction protocol or a 1RM testing protocol.
Background or Experience? Using Logistic Regression to Predict College Retention
Synco, Tracee M.
2012-01-01
Tinto, Astin and countless others have researched the retention and attrition of students from college for more than thirty years. However, the six year graduation rate for all first-time full-time freshmen for the 2002 cohort was 57%. This study sought to determine the retention variables that predicted continued enrollment of entering freshmen…
prediction of concrete mix cost using modified regression theory
African Journals Online (AJOL)
Kambula
2013-07-02
Jul 2, 2013 ... one can predict the cost per cubic meter of concrete if the mix ratios are given. The model can also give possible mix ratios for a specified cost. Statistical tool was used to verify the adequacy of this model. The concrete cost analysis is based on the current market prices of concrete constituent materials.
Conditional mode regression: Application to functional time series prediction
Dabo-Niang, Sophie; Laksaci, Ali
2008-01-01
We consider $\\alpha$-mixing observations and deal with the estimation of the conditional mode of a scalar response variable $Y$ given a random variable $X$ taking values in a semi-metric space. We provide a convergence rate in $L^p$ norm of the estimator. A useful and typical application to functional times series prediction is given.
Adjustment of equations to predict the metabolizable energy of corn for meat type quails
Directory of Open Access Journals (Sweden)
Tiago Junior Pasquetti
2015-08-01
Full Text Available The metabolizable energy (ME determination for foods used in quail diets, through metabolism assays, takes time, infrastructure and financial resources, which makes the development of prediction equations based on proximal composition of foods to estimate the ME values of particular interest. The objective of this study was to adjust the prediction equations of metabolizable energy (ME of corn for quail. The chemical compositions of 12 maize varieties were determined and a metabolism assay was carried out in order to determine the apparent metabolizable energy (AME and nitrogen-corrected apparent metabolizable energy (AMEn of these corn varieties. The values of chemical composition, AME and AMEn, converted to dry matter, were used to adjust the prediction equations. The initial adjustment of simple and multiple linear regression of the AME and AMEn was performed using the values of crude protein (CP, ether extract (EE, neutral (NDF and acid (ADF detergent fiber, mineral matter (MM, calcium (Ca and phosphorus (P as regressors (full model. To adjust the prediction equations the statistical procedure of simple and multiple linear regression was used, with the technique of indirect elimination (Backward. There was adjustment of 10 prediction equations, in which five were for AME and another five for AMEn, the R² values of which ranged from 0.20 to 0.75 and from 0.21 to 0.78, respectively. For all adjusted equations, negative correlations for MM were observed, which may be related to its dilutive effect of the gross energy contained in corn. In conclusion, the equations that showed better adjustment were AME= 5605.46 - 385.074CP + 111.648EE + 48.1133NDF + 303.924ADF - 929.931MM (R²= 0.75 and AMEn= 5878.16 - 403.937CP + 81.9618EE + 41.8954NDF + 303.506FDA - 901.621MM (R²= 0.78.
A dynamic particle filter-support vector regression method for reliability prediction
International Nuclear Information System (INIS)
Wei, Zhao; Tao, Tao; ZhuoShu, Ding; Zio, Enrico
2013-01-01
Support vector regression (SVR) has been applied to time series prediction and some works have demonstrated the feasibility of its use to forecast system reliability. For accuracy of reliability forecasting, the selection of SVR's parameters is important. The existing research works on SVR's parameters selection divide the example dataset into training and test subsets, and tune the parameters on the training data. However, these fixed parameters can lead to poor prediction capabilities if the data of the test subset differ significantly from those of training. Differently, the novel method proposed in this paper uses particle filtering to estimate the SVR model parameters according to the whole measurement sequence up to the last observation instance. By treating the SVR training model as the observation equation of a particle filter, our method allows updating the SVR model parameters dynamically when a new observation comes. Because of the adaptability of the parameters to dynamic data pattern, the new PF–SVR method has superior prediction performance over that of standard SVR. Four application results show that PF–SVR is more robust than SVR to the decrease of the number of training data and the change of initial SVR parameter values. Also, even if there are trends in the test data different from those in the training data, the method can capture the changes, correct the SVR parameters and obtain good predictions. -- Highlights: •A dynamic PF–SVR method is proposed to predict the system reliability. •The method can adjust the SVR parameters according to the change of data. •The method is robust to the size of training data and initial parameter values. •Some cases based on both artificial and real data are studied. •PF–SVR shows superior prediction performance over standard SVR
Development and validation of a predictive equation for lean body mass in children and adolescents.
Foster, Bethany J; Platt, Robert W; Zemel, Babette S
2012-05-01
Lean body mass (LBM) is not easy to measure directly in the field or clinical setting. Equations to predict LBM from simple anthropometric measures, which account for the differing contributions of fat and lean to body weight at different ages and levels of adiposity, would be useful to both human biologists and clinicians. To develop and validate equations to predict LBM in children and adolescents across the entire range of the adiposity spectrum. Dual energy X-ray absorptiometry was used to measure LBM in 836 healthy children (437 females) and linear regression was used to develop sex-specific equations to estimate LBM from height, weight, age, body mass index (BMI) for age z-score and population ancestry. Equations were validated using bootstrapping methods and in a local independent sample of 332 children and in national data collected by NHANES. The mean difference between measured and predicted LBM was - 0.12% (95% limits of agreement - 11.3% to 8.5%) for males and - 0.14% ( - 11.9% to 10.9%) for females. Equations performed equally well across the entire adiposity spectrum, as estimated by BMI z-score. Validation indicated no over-fitting. LBM was predicted within 5% of measured LBM in the validation sample. The equations estimate LBM accurately from simple anthropometric measures.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions
Energy Technology Data Exchange (ETDEWEB)
Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.
2016-07-01
Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)
Multi-fidelity Gaussian process regression for prediction of random fields
Energy Technology Data Exchange (ETDEWEB)
Parussini, L. [Department of Engineering and Architecture, University of Trieste (Italy); Venturi, D., E-mail: venturi@ucsc.edu [Department of Applied Mathematics and Statistics, University of California Santa Cruz (United States); Perdikaris, P. [Department of Mechanical Engineering, Massachusetts Institute of Technology (United States); Karniadakis, G.E. [Division of Applied Mathematics, Brown University (United States)
2017-05-01
We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.
Multi-fidelity Gaussian process regression for prediction of random fields
International Nuclear Information System (INIS)
Parussini, L.; Venturi, D.; Perdikaris, P.; Karniadakis, G.E.
2017-01-01
We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.
Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.
2013-06-01
This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.
Whole-genome regression and prediction methods applied to plant and animal breeding
Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of
Predicting and Modelling of Survival Data when Cox's Regression Model does not hold
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...
Santorelli, Gillian; Petherick, Emily S; Wright, John; Wilson, Brad; Samiei, Haider; Cameron, Noël; Johnson, William
2013-01-01
Advancements in knowledge of obesity aetiology and mobile phone technology have created the opportunity to develop an electronic tool to predict an infant's risk of childhood obesity. The study aims were to develop and validate equations for the prediction of childhood obesity and integrate them into a mobile phone application (App). Anthropometry and childhood obesity risk data were obtained for 1868 UK-born White or South Asian infants in the Born in Bradford cohort. Logistic regression was used to develop prediction equations (at 6 ± 1.5, 9 ± 1.5 and 12 ± 1.5 months) for risk of childhood obesity (BMI at 2 years >91(st) centile and weight gain from 0-2 years >1 centile band) incorporating sex, birth weight, and weight gain as predictors. The discrimination accuracy of the equations was assessed by the area under the curve (AUC); internal validity by comparing area under the curve to those obtained in bootstrapped samples; and external validity by applying the equations to an external sample. An App was built to incorporate six final equations (two at each age, one of which included maternal BMI). The equations had good discrimination (AUCs 86-91%), with the addition of maternal BMI marginally improving prediction. The AUCs in the bootstrapped and external validation samples were similar to those obtained in the development sample. The App is user-friendly, requires a minimum amount of information, and provides a risk assessment of low, medium, or high accompanied by advice and website links to government recommendations. Prediction equations for risk of childhood obesity have been developed and incorporated into a novel App, thereby providing proof of concept that childhood obesity prediction research can be integrated with advancements in technology.
International Nuclear Information System (INIS)
Lopez, R.C.; Gonzalez, L.M.; Garcia, D.
1993-01-01
In the present work, dose-effect regression equations for energy and percentage germination, size, root length and dry mass of plantules from which values of DL-50 middle lethal dose were calculated and likely or unlikely equivalencies among them were established
Akilli, Mustafa
2015-01-01
The aim of this study is to demonstrate the science success regression levels of chosen emotional features of 8th grade students using Structural Equation Model. The study was conducted by the analysis of students' questionnaires and science success in TIMSS 2011 data using SEM. Initially, the factors that are thought to have an effect on science…
Cell membrane temperature rate sensitivity predicted from the Nernst equation.
Barnes, F S
1984-01-01
A hyperpolarized current is predicted from the Nernst equation for conditions of positive temperature derivatives with respect to time. This ion current, coupled with changes in membrane channel conductivities, is expected to contribute to a transient potential shift across the cell membrane for silent cells and to a change in firing rate for pacemaker cells.
Soil loss prediction using universal soil loss equation (USLE ...
African Journals Online (AJOL)
Soil loss prediction using universal soil loss equation (USLE) simulation model in a mountainous area in Ag lasun district, Turkey. ... The need for sufficient knowledge and data for decision makers is obvious hence the present study was carried out. The study area, the Alasun district, is in the middle west of Turkey and is ...
Directory of Open Access Journals (Sweden)
C. Xu
2003-01-01
Full Text Available There is an ever increasing need to apply hydrological models to catchments where streamflow data are unavailable or to large geographical regions where calibration is not feasible. Estimation of model parameters from spatial physical data is the key issue in the development and application of hydrological models at various scales. To investigate the suitability of transferring the regression equations relating model parameters to physical characteristics developed from small sub-catchments to a large region for estimating model parameters, a conceptual snow and water balance model was optimised on all the sub-catchments in the region. A multiple regression analysis related model parameters to physical data for the catchments and the regression equations derived from the small sub-catchments were used to calculate regional parameter values for the large basin using spatially aggregated physical data. For the model tested, the results support the suitability of transferring the regression equations to the larger region. Keywords: water balance modelling,large scale, multiple regression, regionalisation
Energy Technology Data Exchange (ETDEWEB)
Luukkonen, A.; Korkealaakso, J.; Pitkaenen, P. [VTT Communities and Infrastructure, Espoo (Finland)
1997-11-01
Teollisuuden Voima Oy selected five investigation areas for preliminary site studies (1987Ae1992). The more detailed site investigation project, launched at the beginning of 1993 and presently supervised by Posiva Oy, is concentrated to three investigation areas. Romuvaara at Kuhmo is one of the present target areas, and the geochemical, structural and hydrological data used in this study are extracted from there. The aim of the study is to develop suitable methods for groundwater composition estimation based on a group of known hydrogeological variables. The input variables used are related to the host type of groundwater, hydrological conditions around the host location, mixing potentials between different types of groundwater, and minerals equilibrated with the groundwater. The output variables are electrical conductivity, Ca, Mg, Mn, Na, K, Fe, Cl, S, HS, SO{sub 4}, alkalinity, {sup 3}H, {sup 14}C, {sup 13}C, Al, Sr, F, Br and I concentrations, and pH of the groundwater. The methodology is to associate the known hydrogeological conditions (i.e. input variables), with the known water compositions (output variables), and to evaluate mathematical relations between these groups. Output estimations are done with two separate procedures: partial least squares regressions on the principal components of input variables, and by training neural networks with input-output pairs. Coefficients of linear equations and trained networks are optional methods for actual predictions. The quality of output predictions are monitored with confidence limit estimations, evaluated from input variable covariances and output variances, and with charge balance calculations. Groundwater compositions in Romuvaara borehole KR10 are predicted at 10 metre intervals with both prediction methods. 46 refs.
Directory of Open Access Journals (Sweden)
Mini Joseph
2017-01-01
Full Text Available Background: The accuracy of existing predictive equations to determine the resting energy expenditure (REE of professional weightlifters remains scarcely studied. Our study aimed at assessing the REE of male Asian Indian weightlifters with indirect calorimetry and to compare the measured REE (mREE with published equations. A new equation using potential anthropometric variables to predict REE was also evaluated. Materials and Methods: REE was measured on 30 male professional weightlifters aged between 17 and 28 years using indirect calorimetry and compared with the eight formulas predicted by Harris–Benedicts, Mifflin-St. Jeor, FAO/WHO/UNU, ICMR, Cunninghams, Owen, Katch-McArdle, and Nelson. Pearson correlation coefficient, intraclass correlation coefficient, and multiple linear regression analysis were carried out to study the agreement between the different methods, association with anthropometric variables, and to formulate a new prediction equation for this population. Results: Pearson correlation coefficients between mREE and the anthropometric variables showed positive significance with suprailiac skinfold thickness, lean body mass (LBM, waist circumference, hip circumference, bone mineral mass, and body mass. All eight predictive equations underestimated the REE of the weightlifters when compared with the mREE. The highest mean difference was 636 kcal/day (Owen, 1986 and the lowest difference was 375 kcal/day (Cunninghams, 1980. Multiple linear regression done stepwise showed that LBM was the only significant determinant of REE in this group of sportspersons. A new equation using LBM as the independent variable for calculating REE was computed. REE for weightlifters = −164.065 + 0.039 (LBM (confidence interval −1122.984, 794.854]. This new equation reduced the mean difference with mREE by 2.36 + 369.15 kcal/day (standard error = 67.40. Conclusion: The significant finding of this study was that all the prediction equations
Prediction equations for spirometry in four- to six-year-old children.
França, Danielle Corrêa; Camargos, Paulo Augusto Moreira; Jones, Marcus Herbert; Martins, Jocimar Avelar; Vieira, Bruna da Silva Pinto Pinheiro; Colosimo, Enrico Antônio; de Mendonça, Karla Morganna Pereira Pinto; Borja, Raíssa de Oliveira; Britto, Raquel Rodrigues; Parreira, Verônica Franco
2016-01-01
To generate prediction equations for spirometry in 4- to 6-year-old children. Forced vital capacity, forced expiratory volume in 0.5s, forced expiratory volume in one second, peak expiratory flow, and forced expiratory flow at 25-75% of the forced vital capacity were assessed in 195 healthy children residing in the town of Sete Lagoas, state of Minas Gerais, Southeastern Brazil. The least mean squares method was used to derive the prediction equations. The level of significance was established as p<0.05. Overall, 85% of the children succeeded in performing the spirometric maneuvers. In the prediction equation, height was the single predictor of the spirometric variables as follows: forced vital capacity=exponential [(-2.255)+(0.022×height)], forced expiratory volume in 0.5s=exponential [(-2.288)+(0.019×height)], forced expiratory volume in one second=exponential [(-2.767)+(0.026×height)], peak expiratory flow=exponential [(-2.908)+(0.019×height)], and forced expiratory flow at 25-75% of the forced vital capacity=exponential [(-1.404)+(0.016×height)]. Neither age nor weight influenced the regression equations. No significant differences in the predicted values for boys and girls were observed. The predicted values obtained in the present study are comparable to those reported for preschoolers from both Brazil and other countries. Copyright © 2016 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.
Stochastic Optimal Prediction with Application to Averaged Euler Equations
Energy Technology Data Exchange (ETDEWEB)
Bell, John [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chorin, Alexandre J. [Univ. of California, Berkeley, CA (United States); Crutchfield, William [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
2017-04-24
Optimal prediction (OP) methods compensate for a lack of resolution in the numerical solution of complex problems through the use of an invariant measure as a prior measure in the Bayesian sense. In first-order OP, unresolved information is approximated by its conditional expectation with respect to the invariant measure. In higher-order OP, unresolved information is approximated by a stochastic estimator, leading to a system of random or stochastic differential equations. We explain the ideas through a simple example, and then apply them to the solution of Averaged Euler equations in two space dimensions.
The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics
Directory of Open Access Journals (Sweden)
Ronald de Vlaming
2015-01-01
Full Text Available In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i the theoretical foundations of ridge regression, (ii its link to commonly used methods in animal breeding, (iii the computational feasibility, and (iv the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis. Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., N<10,000 the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP. However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani
2016-01-01
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN
Directory of Open Access Journals (Sweden)
Ivanka Jerić
2011-11-01
Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.
The current and future use of ridge regression for prediction in quantitative genetics
Vlaming, Ronald; Groenen, Patrick
2015-01-01
textabstractIn recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to...
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction
International Nuclear Information System (INIS)
Crop, F; Thierens, H; Rompaye, B Van; Paelinck, L; Vakaet, L; Wagter, C De
2008-01-01
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection
Kumar, Sricharan; Srivistava, Ashok N.
2012-01-01
Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Comparison of ν-support vector regression and logistic equation for ...
African Journals Online (AJOL)
Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Application of support vector regression (SVR) for stream flow prediction on the Amazon basin
CSIR Research Space (South Africa)
Du Toit, Melise
2016-10-01
Full Text Available regression technique is used in this study to analyse historical stream flow occurrences and predict stream flow values for the Amazon basin. Up to twelve month predictions are made and the coefficient of determination and root-mean-square error are used...
The prediction of intelligence in preschool children using alternative models to regression.
Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E
2011-12-01
Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
Stochastic Ocean Predictions with Dynamically-Orthogonal Primitive Equations
Subramani, D. N.; Haley, P., Jr.; Lermusiaux, P. F. J.
2017-12-01
The coastal ocean is a prime example of multiscale nonlinear fluid dynamics. Ocean fields in such regions are complex and intermittent with unstationary heterogeneous statistics. Due to the limited measurements, there are multiple sources of uncertainties, including the initial conditions, boundary conditions, forcing, parameters, and even the model parameterizations and equations themselves. For efficient and rigorous quantification and prediction of these uncertainities, the stochastic Dynamically Orthogonal (DO) PDEs for a primitive equation ocean modeling system with a nonlinear free-surface are derived and numerical schemes for their space-time integration are obtained. Detailed numerical studies with idealized-to-realistic regional ocean dynamics are completed. These include consistency checks for the numerical schemes and comparisons with ensemble realizations. As an illustrative example, we simulate the 4-d multiscale uncertainty in the Middle Atlantic/New York Bight region during the months of Jan to Mar 2017. To provide intitial conditions for the uncertainty subspace, uncertainties in the region were objectively analyzed using historical data. The DO primitive equations were subsequently integrated in space and time. The probability distribution function (pdf) of the ocean fields is compared to in-situ, remote sensing, and opportunity data collected during the coincident POSYDON experiment. Results show that our probabilistic predictions had skill and are 3- to 4- orders of magnitude faster than classic ensemble schemes.
Anak Gisen, Jacqueline Isabella; Nijzink, Remko C.; Savenije, Hubert H. G.
2014-05-01
Dispersion mathematical representation of tidal mixing between sea water and fresh water in The definition of dispersion somehow remains unclear as it is not directly measurable. The role of dispersion is only meaningful if it is related to the appropriate temporal and spatial scale of mixing, which are identified as the tidal period, tidal excursion (longitudinal), width of estuary (lateral) and mixing depth (vertical). Moreover, the mixing pattern determines the salt intrusion length in an estuary. If a physically based description of the dispersion is defined, this would allow the analytical solution of the salt intrusion problem. The objective of this study is to develop a predictive equation for estimating the dispersion coefficient at tidal average (TA) condition, which can be applied in the salt intrusion model to predict the salinity profile for any estuary during different events. Utilizing available data of 72 measurements in 27 estuaries (including 6 recently studied estuaries in Malaysia), regressions analysis has been performed with various combinations of dimensionless parameters . The predictive dispersion equations have been developed for two different locations, at the mouth D0TA and at the inflection point D1TA (where the convergence length changes). Regressions have been carried out with two separated datasets: 1) more reliable data for calibration; and 2) less reliable data for validation. The combination of dimensionless ratios that give the best performance is selected as the final outcome which indicates that the dispersion coefficient is depending on the tidal excursion, tidal range, tidal velocity amplitude, friction and the Richardson Number. A limitation of the newly developed equation is that the friction is generally unknown. In order to compensate this problem, further analysis has been performed adopting the hydraulic model of Cai et. al. (2012) to estimate the friction and depth. Keywords: dispersion, alluvial estuaries, mixing, salt
Spatial Regression and Prediction of Water Quality in a Watershed with Complex Pollution Sources.
Yang, Xiaoying; Liu, Qun; Luo, Xingzhang; Zheng, Zheng
2017-08-16
Fast economic development, burgeoning population growth, and rapid urbanization have led to complex pollution sources contributing to water quality deterioration simultaneously in many developing countries including China. This paper explored the use of spatial regression to evaluate the impacts of watershed characteristics on ambient total nitrogen (TN) concentration in a heavily polluted watershed and make predictions across the region. Regression results have confirmed the substantial impact on TN concentration by a variety of point and non-point pollution sources. In addition, spatial regression has yielded better performance than ordinary regression in predicting TN concentrations. Due to its best performance in cross-validation, the river distance based spatial regression model was used to predict TN concentrations across the watershed. The prediction results have revealed a distinct pattern in the spatial distribution of TN concentrations and identified three critical sub-regions in priority for reducing TN loads. Our study results have indicated that spatial regression could potentially serve as an effective tool to facilitate water pollution control in watersheds under diverse physical and socio-economical conditions.
Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang
2016-07-05
Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Directory of Open Access Journals (Sweden)
Nataša Šarlija
2017-01-01
Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.
Validation of resting metabolic rate prediction equations for teenagers
Directory of Open Access Journals (Sweden)
Paulo Henrique Santos da Fonseca
2007-09-01
Full Text Available The resting metabolic rate (RMR can be defi ned as the minimum rate of energy spent and represents the main component of the energetic outlay. The purpose of this study is to validate equations to predict the resting metabolic rate in teenagers (103 individuals, being 51 girls and 52 boys, with age between 10 and 17 years from Florianópolis – SC – Brazil. It was measured: the body weight, body height, skinfolds and obtained the lean and body fat mass through bioimpedance. The nonproteic RMR was measured by Weir’s equation (1949, utilizing AeroSport TEEM-100 gas analyzer. The studied equations were: Harry and Benedict (1919, Schofi eld (1985, WHO/FAO/UNU (1985, Henry and Rees (1991, Molnár et al. (1998, Tverskaya et al. (1998 and Müller et al. (2004. In order to study the cross-validation of the RMR prediction equations and its standard measure (Weir 1949, the following statistics procedure were calculated: Pearson’s correlation (r ≥ 0.70, the “t” test with the signifi cance level of p0.05 in relation to the standard measure, with exception of the equations suggested for Tverskaya et al. (1998, and the two models of Müller et al (2004. Even though there was not a signifi cant difference, only the models considered for Henry and Rees (1991, and Molnár et al. (1995 had gotten constant error variation under 5%. All the equations analyzed in the study in girls had not reached criterion of correlation values of 0.70 with the indirect calorimetry. Analyzing the prediction equations of RMR in boys, all of them had moderate correlation coeffi cients with the indirect calorimetry, however below 0.70. Only the equation developed for Tverskaya et al. (1998 presented differences (p ABSTRACT0,05 em relação à medida padrão (Weir 1949, com exceção das equações sugeridas por Tverskaya et al. (1998 e os dois modelos de Müller et al (2004. Mesmo não havendo diferença signifi cativa, somente os modelos propostos por Henry e Rees (1991
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Directory of Open Access Journals (Sweden)
Sugawara N
2014-02-01
Full Text Available Norio Sugawara,1 Norio Yasui-Furukori,1 Tetsu Tomita,1,2 Hanako Furukori,3 Kazutoshi Kubo,1,4 Taku Nakagami,1,4 Sunao Kaneko1 1Department of Neuropsychiatry, Hirosaki University School of Medicine, Hirosaki, 2Department of Psychiatry, Hirosaki-Aiseikai Hospital, Hirosaki, 3Department of Psychiatry, Kuroishi-Akebono Hospital, Kuroishi, 4Department of Psychiatry, Odate Municipal General Hospital, Odate, Japan Background: Recently, a relationship between obesity and schizophrenia has been reported. The prediction of resting energy expenditure (REE is important to determine the energy expenditure of patients with schizophrenia. However, there is a lack of research concerning the most accurate REE predictive equations among Asian patients with schizophrenia. The purpose of the study reported here was to compare the validity of four REE equations for patients with schizophrenia taking antipsychotics. Methods: For this cross-sectional study, we recruited patients (n=110 who had a Diagnostic and Statistical Manual of Mental Disorders, fourth edition, diagnosis of schizophrenia and were admitted to four psychiatric hospitals. The mean (± standard deviation age of these patients was 45.9±13.2 years. Anthropometric measurements (of height, weight, body mass index were taken at the beginning of the study. REE was measured using indirect calorimetry. Comparisons between the measured and estimated REEs from the four equations (Harris–Benedict, Mifflin–St Jeor, Food and Agriculture Organization/World Health Organization/United Nations University, and Schofield were performed using simple linear regression analysis and Bland–Altman analysis. Results: Significant trends were found between the measured and predicted REEs for all four equations (P<0.001, with the Harris–Benedict equation demonstrating the strongest correlation in both men and women (r=0.617, P<0.001. In all participants, Bland–Altman analysis revealed that the Harris–Benedict and
Modeling and prediction of Turkey's electricity consumption using Support Vector Regression
International Nuclear Information System (INIS)
Kavaklioglu, Kadir
2011-01-01
Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)
Directory of Open Access Journals (Sweden)
Laura Cornejo-Bueno
2017-11-01
Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.
Zhang, Keda; Abraham, Michael H; Liu, Xiangli
2017-04-15
Experimental values of permeability coefficients, as log K p , of chemical compounds across human skin were collected by carefully screening the literature, and adjusted to 37°C for the effect of temperature. The values of log K p for partially ionized acids and bases were separated into those for their neutral and ionic species, forming a total data set of 247 compounds and species (including 35 ionic species). The obtained log K p values have been regressed against Abraham solute descriptors to yield a correlation equation with R 2 =0.866 and SD=0.432 log units. The equation can provide valid predictions for log K p of neutral molecules, ions and ionic species, with predictive R 2 =0.858 and predictive SD=0.445 log units calculated by the leave-one-out statistics. The predicted log K p values for Na + and Et 4 N + are in good agreement with the observed values. We calculated the values of log K p of ketoprofen as a function of the pH of the donor solution, and found that log K p markedly varies only when ketoprofen is largely ionized. This explains why models that neglect ionization of permeants still yield reasonable statistical results. The effect of skin thickness on log K p was investigated by inclusion of two indicator variables, one for intermediate thickness skin and one for full thickness skin, into the above equation. The newly obtained equations were found to be statistically very close to the above equation. Therefore, the thickness of human skin used makes little difference to the experimental values of log K p . Copyright © 2017 Elsevier B.V. All rights reserved.
He, Dan; Kuhn, David; Parida, Laxmi
2016-06-15
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
Febrian Umbara, Rian; Tarwidi, Dede; Budi Setiawan, Erwin
2018-03-01
The paper discusses the prediction of Jakarta Composite Index (JCI) in Indonesia Stock Exchange. The study is based on JCI historical data for 1286 days to predict the value of JCI one day ahead. This paper proposes predictions done in two stages., The first stage using Fuzzy Time Series (FTS) to predict values of ten technical indicators, and the second stage using Support Vector Regression (SVR) to predict the value of JCI one day ahead, resulting in a hybrid prediction model FTS-SVR. The performance of this combined prediction model is compared with the performance of the single stage prediction model using SVR only. Ten technical indicators are used as input for each model.
Directory of Open Access Journals (Sweden)
Słania J.
2014-10-01
Full Text Available The article presents the process of production of coated electrodes and their welding properties. The factors concerning the welding properties and the currently applied method of assessing are given. The methodology of the testing based on the measuring and recording of instantaneous values of welding current and welding arc voltage is discussed. Algorithm for creation of reference data base of the expert system is shown, aiding the assessment of covered electrodes welding properties. The stability of voltage–current characteristics was discussed. Statistical factors of instantaneous values of welding current and welding arc voltage waveforms used for determining of welding process stability are presented. The results of coated electrodes welding properties are compared. The article presents the results of linear regression as well as the impact of the independent variables on the welding process performance. Finally the conclusions drawn from the research are given.
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Real-time prediction of respiratory motion based on local regression methods
International Nuclear Information System (INIS)
Ruan, D; Fessler, J A; Balter, J M
2007-01-01
Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
Directory of Open Access Journals (Sweden)
Nikita A. Moiseev
2017-01-01
Full Text Available The paper is devoted to a new randomization method that yields unbiased adjustments of p-values for linear regression models predictors by incorporating the number of potential explanatory variables, their variance-covariance matrix and its uncertainty, based on the number of observations. This adjustment helps to control type I errors in scientific studies, significantly decreasing the number of publications that report false relations to be authentic ones. Comparative analysis with such existing methods as Bonferroni correction and Shehata and White adjustments explicitly shows their imperfections, especially in case when the number of observations and the number of potential explanatory variables are approximately equal. Also during the comparative analysis it was shown that when the variance-covariance matrix of a set of potential predictors is diagonal, i.e. the data are independent, the proposed simple correction is the best and easiest way to implement the method to obtain unbiased corrections of traditional p-values. However, in the case of the presence of strongly correlated data, a simple correction overestimates the true pvalues, which can lead to type II errors. It was also found that the corrected p-values depend on the number of observations, the number of potential explanatory variables and the sample variance-covariance matrix. For example, if there are only two potential explanatory variables competing for one position in the regression model, then if they are weakly correlated, the corrected p-value will be lower than when the number of observations is smaller and vice versa; if the data are highly correlated, the case with a larger number of observations will show a lower corrected p-value. With increasing correlation, all corrections, regardless of the number of observations, tend to the original p-value. This phenomenon is easy to explain: as correlation coefficient tends to one, two variables almost linearly depend on each
Zhang, Zhilin; Savenije, Hubert H. G.
2017-07-01
The practical value of the surprisingly simple Van der Burgh equation in predicting saline water intrusion in alluvial estuaries is well documented, but the physical foundation of the equation is still weak. In this paper we provide a connection between the empirical equation and the theoretical literature, leading to a theoretical range of Van der Burgh's coefficient of 1/2 residual circulation. This type of mixing is relevant in the wider part of alluvial estuaries where preferential ebb and flood channels appear. Subsequently, this dispersion equation is combined with the salt balance equation to obtain a new predictive analytical equation for the longitudinal salinity distribution. Finally, the new equation was tested and applied to a large database of observations in alluvial estuaries, whereby the calibrated K values appeared to correspond well to the theoretical range.
Sabounchi, N S; Rahmandad, H; Ammerman, A
2013-10-01
Basal metabolic rate (BMR) represents the largest component of total energy expenditure and is a major contributor to energy balance. Therefore, accurately estimating BMR is critical for developing rigorous obesity prevention and control strategies. Over the past several decades, numerous BMR formulas have been developed targeted to different population groups. A comprehensive literature search revealed 248 BMR estimation equations developed using diverse ranges of age, gender, race, fat-free mass, fat mass, height, waist-to-hip ratio, body mass index and weight. A subset of 47 studies included enough detail to allow for development of meta-regression equations. Utilizing these studies, meta-equations were developed targeted to 20 specific population groups. This review provides a comprehensive summary of available BMR equations and an estimate of their accuracy. An accompanying online BMR prediction tool (available at http://www.sdl.ise.vt.edu/tutorials.html) was developed to automatically estimate BMR based on the most appropriate equation after user-entry of individual age, race, gender and weight.
A Hybrid Ground-Motion Prediction Equation for Earthquakes in Western Alberta
Spriggs, N.; Yenier, E.; Law, A.; Moores, A. O.
2015-12-01
Estimation of ground-motion amplitudes that may be produced by future earthquakes constitutes the foundation of seismic hazard assessment and earthquake-resistant structural design. This is typically done by using a prediction equation that quantifies amplitudes as a function of key seismological variables such as magnitude, distance and site condition. In this study, we develop a hybrid empirical prediction equation for earthquakes in western Alberta, where evaluation of seismic hazard associated with induced seismicity is of particular interest. We use peak ground motions and response spectra from recorded seismic events to model the regional source and attenuation attributes. The available empirical data is limited in the magnitude range of engineering interest (M>4). Therefore, we combine empirical data with a simulation-based model in order to obtain seismologically informed predictions for moderate-to-large magnitude events. The methodology is two-fold. First, we investigate the shape of geometrical spreading in Alberta. We supplement the seismic data with ground motions obtained from mining/quarry blasts, in order to gain insights into the regional attenuation over a wide distance range. A comparison of ground-motion amplitudes for earthquakes and mining/quarry blasts show that both event types decay at similar rates with distance and demonstrate a significant Moho-bounce effect. In the second stage, we calibrate the source and attenuation parameters of a simulation-based prediction equation to match the available amplitude data from seismic events. We model the geometrical spreading using a trilinear function with attenuation rates obtained from the first stage, and calculate coefficients of anelastic attenuation and site amplification via regression analysis. This provides a hybrid ground-motion prediction equation that is calibrated for observed motions in western Alberta and is applicable to moderate-to-large magnitude events.
Development of a predictive energy equation for maintenance hemodialysis patients: a pilot study.
Byham-Gray, Laura; Parrott, J Scott; Ho, Wai Yin; Sundell, Mary B; Ikizler, T Alp
2014-01-01
The study objectives were to explore the predictors of measured resting energy expenditure (mREE) among a sample of maintenance hemodialysis (MHD) patients, to generate a predictive energy equation (MHDE), and to compare such models to another commonly used predictive energy equation in nutritional care, the Mifflin-St. Jeor equation (MSJE). The study was a retrospective, cross-sectional cohort design conducted at the Vanderbilt University Medical Center. Study subjects were adult MHD patients (N = 67). Data collected from several clinical trials were analyzed using Pearson's correlation and multivariate linear regression procedures. Demographic, anthropometric, clinical, and laboratory data were examined as potential predictors of mREE. Limits of agreement between the MHDE and the MSJE were evaluated using Bland-Altman plots. The a priori α was set at P lean body mass [LBM]) of mREE included (R(2) = 0.489) FFM, ALB, age, and CRP. Two additional models (MHDE-CRP and MHDE-CR) with acceptable predictability (R(2) = 0.460 and R(2) = 0.451) were derived to improve the clinical utility of the developed energy equation (MHDE-LBM). Using Bland-Altman plots, the MHDE over- and underpredicted mREE less often than the MSJE. Predictive models (MHDE) including selective demographic, clinical, and anthropometric data explained less than 50% variance of mREE but had better precision in determining energy requirements for MHD patients when compared with MSJE. Further research is necessary to improve predictive models of mREE in the MHD population and to test its validity and clinical application. Copyright © 2014 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Chen, Wei-Hsin; Hsu, Hung-Jen; Kumar, Gopalakrishnan; Budzianowski, Wojciech M; Ong, Hwai Chyuan
2017-12-01
This study focuses on the biochar formation and torrefaction performance of sugarcane bagasse, and they are predicted using the bilinear interpolation (BLI), inverse distance weighting (IDW) interpolation, and regression analysis. It is found that the biomass torrefied at 275°C for 60min or at 300°C for 30min or longer is appropriate to produce biochar as alternative fuel to coal with low carbon footprint, but the energy yield from the torrefaction at 300°C is too low. From the biochar yield, enhancement factor of HHV, and energy yield, the results suggest that the three methods are all feasible for predicting the performance, especially for the enhancement factor. The power parameter of unity in the IDW method provides the best predictions and the error is below 5%. The second order in regression analysis gives a more reasonable approach than the first order, and is recommended for the predictions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.
Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A
2017-01-01
For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling
Directory of Open Access Journals (Sweden)
Eric R. Edelman
2017-06-01
Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Jiang, Wei; Xu, Chao-Zhen; Jiang, Si-Zhi; Zhang, Tang-Duo; Wang, Shi-Zhen; Fang, Bai-Shan
2017-04-01
L-tert-Leucine (L-Tle) and its derivatives are extensively used as crucial building blocks for chiral auxiliaries, pharmaceutically active ingredients, and ligands. Combining with formate dehydrogenase (FDH) for regenerating the expensive coenzyme NADH, leucine dehydrogenase (LeuDH) is continually used for synthesizing L-Tle from α-keto acid. A multilevel factorial experimental design was executed for research of this system. In this work, an efficient optimization method for improving the productivity of L-Tle was developed. And the mathematical model between different fermentation conditions and L-Tle yield was also determined in the form of the equation by using uniform design and regression analysis. The multivariate regression equation was conveniently implemented in water, with a space time yield of 505.9 g L -1 day -1 and an enantiomeric excess value of >99 %. These results demonstrated that this method might become an ideal protocol for industrial production of chiral compounds and unnatural amino acids such as chiral drug intermediates.
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Phase Space Prediction of Chaotic Time Series with Nu-Support Vector Machine Regression
International Nuclear Information System (INIS)
Ye Meiying; Wang Xiaodong
2005-01-01
A new class of support vector machine, nu-support vector machine, is discussed which can handle both classification and regression. We focus on nu-support vector machine regression and use it for phase space prediction of chaotic time series. The effectiveness of the method is demonstrated by applying it to the Henon map. This study also compares nu-support vector machine with back propagation (BP) networks in order to better evaluate the performance of the proposed methods. The experimental results show that the nu-support vector machine regression obtains lower root mean squared error than the BP networks and provides an accurate chaotic time series prediction. These results can be attributable to the fact that nu-support vector machine implements the structural risk minimization principle and this leads to better generalization than the BP networks.
Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts
DEFF Research Database (Denmark)
Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain
2017-01-01
Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella....... One part of obtained logtransformed cell counts was used for model development and another for model validation. The Ratkowsky square root model and the relative lag time (RLT) model were integrated into the logistic model with delay. Fitted parameter estimates were compared to investigate the effect...
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research
He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne
2018-01-01
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
Genomic prediction based on data from three layer lines using non-linear regression models
Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.
2014-01-01
Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Directory of Open Access Journals (Sweden)
Anita Nordenson
2010-09-01
Full Text Available Anita Nordenson2, Anne Marie Grönberg1,2, Lena Hulthén1, Sven Larsson2, Frode Slinde11Department of Clinical Nutrition, Sahlgrenska Academy at University of Gothenburg, Göteborg, Sweden; 2Department of Internal Medicine/Respiratory Medicine and Allergology, Sahlgrenska Academy at University of Gothenburg, SwedenAbstract: Malnutrition is a serious condition in chronic obstructive pulmonary disease (COPD. Successful dietary intervention calls for calculations of resting metabolic rate (RMR. One disease-specific prediction equation for RMR exists based on mainly male patients. To construct a disease-specific equation for RMR based on measurements in underweight or weight-losing women and men with COPD, RMR was measured by indirect calorimetry in 30 women and 11 men with a diagnosis of COPD and body mass index <21 kg/m2. The following variables, possibly influencing RMR were measured: length, weight, middle upper arm circumference, triceps skinfold, body composition by dual energy x-ray absorptiometry and bioelectrical impedance, lung function, and markers of inflammation. Relations between RMR and measured variables were studied using univariate analysis according to Pearson. Gender and variables that were associated with RMR with a P value <0.15 were included in a forward multiple regression analysis. The best-fit multiple regression equation included only fat-free mass (FFM: RMR (kJ/day = 1856 + 76.0 FFM (kg. To conclude, FFM is the dominating factor influencing RMR. The developed equation can be used for prediction of RMR in underweight COPD patients.Keywords: pulmonary disease, chronic obstructive, basal metabolic rate, malnutrition, body composition
Wang, Shan-Shan; Zhang, Yu-Qi; Chen, Shu-Bao; Huang, Guo-Ying; Zhang, Hong-Yan; Zhang, Zhi-Fang; Wu, Lan-Ping; Hong, Wen-Jing; Shen, Rong; Liu, Yi-Qing; Zhu, Jun-Xue
2017-06-01
Clinical decision making in children with congenital and acquired heart disease relies on measurements of cardiac structures using two-dimensional echocardiography. We aimed to establish z-score regression equations for right heart structures in healthy Chinese Han children. Two-dimensional and M-mode echocardiography was performed in 515 patients. We measured the dimensions of the pulmonary valve annulus (PVA), main pulmonary artery (MPA), left pulmonary artery (LPA), right pulmonary artery (RPA), right ventricular outflow tract at end-diastole (RVOTd) and at end-systole (RVOTs), tricuspid valve annulus (TVA), right ventricular inflow tract at end-diastole (RVIDd) and at end-systole (RVIDs), and right atrium (RA). Regression analyses were conducted to relate the measurements of right heart structures to 4body surface area (BSA). Right ventricular outflow-tract fractional shortening (RVOTFS) was also calculated. Several models were used, and the best model was chosen to establish a z-score calculator. PVA, MPA, LPA, RPA, RVOTd, RVOTs, TVA, RVIDd, RVIDs, and RA (R 2 = 0.786, 0.705, 0.728, 0.701, 0.706, 0.824, 0.804, 0.663, 0.626, and 0.793, respectively) had a cubic polynomial relationship with BSA; specifically, measurement (M) = β0 + β1 × BSA + β2 × BSA 2 + β3 × BSA. 3 RVOTFS (0.28 ± 0.02) fell within a narrow range (0.12-0.51). Our results provide reference values for z scores and regression equations for right heart structures in Han Chinese children. These data may help interpreting the routine clinical measurement of right heart structures in children with congenital or acquired heart disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 45:293-303, 2017. © 2017 Wiley Periodicals, Inc.
International Nuclear Information System (INIS)
Riaz, Nadeem; Wiersma, Rodney; Mao Weihua; Xing Lei; Shanker, Piyush; Gudmundsson, Olafur; Widrow, Bernard
2009-01-01
Intra-fraction tumor tracking methods can improve radiation delivery during radiotherapy sessions. Image acquisition for tumor tracking and subsequent adjustment of the treatment beam with gating or beam tracking introduces time latency and necessitates predicting the future position of the tumor. This study evaluates the use of multi-dimensional linear adaptive filters and support vector regression to predict the motion of lung tumors tracked at 30 Hz. We expand on the prior work of other groups who have looked at adaptive filters by using a general framework of a multiple-input single-output (MISO) adaptive system that uses multiple correlated signals to predict the motion of a tumor. We compare the performance of these two novel methods to conventional methods like linear regression and single-input, single-output adaptive filters. At 400 ms latency the average root-mean-square-errors (RMSEs) for the 14 treatment sessions studied using no prediction, linear regression, single-output adaptive filter, MISO and support vector regression are 2.58, 1.60, 1.58, 1.71 and 1.26 mm, respectively. At 1 s, the RMSEs are 4.40, 2.61, 3.34, 2.66 and 1.93 mm, respectively. We find that support vector regression most accurately predicts the future tumor position of the methods studied and can provide a RMSE of less than 2 mm at 1 s latency. Also, a multi-dimensional adaptive filter framework provides improved performance over single-dimension adaptive filters. Work is underway to combine these two frameworks to improve performance.
Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi
2013-01-01
Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Prediction equation for estimating total daily energy requirements of special operations personnel.
Barringer, N D; Pasiakos, S M; McClung, H L; Crombie, A P; Margolis, L M
2018-01-01
Special Operations Forces (SOF) engage in a variety of military tasks with many producing high energy expenditures, leading to undesired energy deficits and loss of body mass. Therefore, the ability to accurately estimate daily energy requirements would be useful for accurate logistical planning. Generate a predictive equation estimating energy requirements of SOF. Retrospective analysis of data collected from SOF personnel engaged in 12 different SOF training scenarios. Energy expenditure and total body water were determined using the doubly-labeled water technique. Physical activity level was determined as daily energy expenditure divided by resting metabolic rate. Physical activity level was broken into quartiles (0 = mission prep, 1 = common warrior tasks, 2 = battle drills, 3 = specialized intense activity) to generate a physical activity factor (PAF). Regression analysis was used to construct two predictive equations (Model A; body mass and PAF, Model B; fat-free mass and PAF) estimating daily energy expenditures. Average measured energy expenditure during SOF training was 4468 (range: 3700 to 6300) Kcal·d- 1 . Regression analysis revealed that physical activity level ( r = 0.91; P plan appropriate feeding regimens to meet SOF nutritional requirements across their mission profile.
Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees
Directory of Open Access Journals (Sweden)
Chen Xiaoyu
2007-12-01
Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Application of General Regression Neural Network to the Prediction of LOD Change
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Bry, X; Verron, T; Cazes, P
2009-05-29
In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Spontaneous regression of retinopathy of prematurity:incidence and predictive factors
Directory of Open Access Journals (Sweden)
Rui-Hong Ju
2013-08-01
Full Text Available AIM:To evaluate the incidence of spontaneous regression of changes in the retina and vitreous in active stage of retinopathy of prematurity(ROP and identify the possible relative factors during the regression.METHODS: This was a retrospective, hospital-based study. The study consisted of 39 premature infants with mild ROP showed spontaneous regression (Group A and 17 with severe ROP who had been treated before naturally involuting (Group B from August 2008 through May 2011. Data on gender, single or multiple pregnancy, gestational age, birth weight, weight gain from birth to the sixth week of life, use of oxygen in mechanical ventilation, total duration of oxygen inhalation, surfactant given or not, need for and times of blood transfusion, 1,5,10-min Apgar score, presence of bacterial or fungal or combined infection, hyaline membrane disease (HMD, patent ductus arteriosus (PDA, duration of stay in the neonatal intensive care unit (NICU and duration of ROP were recorded.RESULTS: The incidence of spontaneous regression of ROP with stage 1 was 86.7%, and with stage 2, stage 3 was 57.1%, 5.9%, respectively. With changes in zone Ⅲ regression was detected 100%, in zoneⅡ 46.2% and in zoneⅠ 0%. The mean duration of ROP in spontaneous regression group was 5.65±3.14 weeks, lower than that of the treated ROP group (7.34±4.33 weeks, but this difference was not statistically significant (P=0.201. GA, 1min Apgar score, 5min Apgar score, duration of NICU stay, postnatal age of initial screening and oxygen therapy longer than 10 days were significant predictive factors for the spontaneous regression of ROP (P＜0.05. Retinal hemorrhage was the only independent predictive factor the spontaneous regression of ROP (OR 0.030, 95%CI 0.001-0.775, P=0.035.CONCLUSION:This study showed most stage 1 and 2 ROP and changes in zone Ⅲ can spontaneously regression in the end. Retinal hemorrhage is weakly inversely associated with the spontaneous regression.
Directory of Open Access Journals (Sweden)
Kritski Afrânio
2006-02-01
Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Directory of Open Access Journals (Sweden)
Massoud Tabesh
2011-07-01
Full Text Available Optimum operation of water distribution networks is one of the priorities of sustainable development of water resources, considering the issues of increasing efficiency and decreasing the water losses. One of the key subjects in optimum operational management of water distribution systems is preparing rehabilitation and replacement schemes, prediction of pipes break rate and evaluation of their reliability. Several approaches have been presented in recent years regarding prediction of pipe failure rates which each one requires especial data sets. Deterministic models based on age and deterministic multi variables and stochastic group modeling are examples of the solutions which relate pipe break rates to parameters like age, material and diameters. In this paper besides the mentioned parameters, more factors such as pipe depth and hydraulic pressures are considered as well. Then using multi variable regression method, intelligent approaches (Artificial neural network and neuro fuzzy models and Evolutionary polynomial Regression method (EPR pipe burst rate are predicted. To evaluate the results of different approaches, a case study is carried out in a part ofMashhadwater distribution network. The results show the capability and advantages of ANN and EPR methods to predict pipe break rates, in comparison with neuro fuzzy and multi-variable regression methods.
7 CFR 610.12 - Equations for predicting soil loss due to water erosion.
2010-01-01
.... (a) The equation for predicting soil loss due to erosion for both the USLE and the RUSLE is A = R × K... 22161.) (b) The factors in the USLE equation are: (1) A is the estimation of average annual soil loss in... 7 Agriculture 6 2010-01-01 2010-01-01 false Equations for predicting soil loss due to water...
Joshi, Shuchi N; Srinivas, Nuggehally R; Parmar, Deven V
2018-03-01
Our aim was to develop and validate the extrapolative performance of a regression model using a limited sampling strategy for accurate estimation of the area under the plasma concentration versus time curve for saroglitazar. Healthy subject pharmacokinetic data from a well-powered food-effect study (fasted vs fed treatments; n = 50) was used in this work. The first 25 subjects' serial plasma concentration data up to 72 hours and corresponding AUC 0-t (ie, 72 hours) from the fasting group comprised a training dataset to develop the limited sampling model. The internal datasets for prediction included the remaining 25 subjects from the fasting group and all 50 subjects from the fed condition of the same study. The external datasets included pharmacokinetic data for saroglitazar from previous single-dose clinical studies. Limited sampling models were composed of 1-, 2-, and 3-concentration-time points' correlation with AUC 0-t of saroglitazar. Only models with regression coefficients (R 2 ) >0.90 were screened for further evaluation. The best R 2 model was validated for its utility based on mean prediction error, mean absolute prediction error, and root mean square error. Both correlations between predicted and observed AUC 0-t of saroglitazar and verification of precision and bias using Bland-Altman plot were carried out. None of the evaluated 1- and 2-concentration-time points models achieved R 2 > 0.90. Among the various 3-concentration-time points models, only 4 equations passed the predefined criterion of R 2 > 0.90. Limited sampling models with time points 0.5, 2, and 8 hours (R 2 = 0.9323) and 0.75, 2, and 8 hours (R 2 = 0.9375) were validated. Mean prediction error, mean absolute prediction error, and root mean square error were prediction of saroglitazar. The same models, when applied to the AUC 0-t prediction of saroglitazar sulfoxide, showed mean prediction error, mean absolute prediction error, and root mean square error model predicts the exposure of
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
Equations of prediction for abdominal fat in brown egg-laying hens fed different diets.
Souza, C; Jaimes, J J B; Gewehr, C E
2017-06-01
The objective was to use noninvasive measurements to formulate equations for predicting the abdominal fat weight of laying hens in a noninvasive manner. Hens were fed with different diets; the external body measurements of birds were used as regressors. We used 288 Hy-Line Brown laying hens, distributed in a completely randomized design in a factorial arrangement, submitted for 16 wk to 2 metabolizable energy levels (2,550 and 2,800 kcal/kg) and 3 levels of crude protein in the diet (150, 160, and 170 g/kg), totaling 6 treatments, with 48 hens each. Sixteen hens per treatment of 92 wk age were utilized to evaluate body weight, bird length, tarsus and sternum, greater and lesser diameter of the tarsus, and abdominal fat weight, after slaughter. The equations were obtained by using measures evaluated with regressors through simple and multiple linear regression with the stepwise method of indirect elimination (backward), with P abdominal fat as predicted by the equations and observed values for each bird were subjected to Pearson's correlation analysis. The equations generated by energy levels showed coefficients of determination of 0.50 and 0.74 for 2,800 and 2,550 kcal/kg of metabolizable energy, respectively, with correlation coefficients of 0.71 and 0.84, with a highly significant correlation between the calculated and observed values of abdominal fat. For protein levels of 150, 160, and 170 g/kg in the diet, it was possible to obtain coefficients of determination of 0.75, 0.57, and 0.61, with correlation coefficients of 0.86, 0.75, and 0.78, respectively. Regarding the general equation for predicting abdominal fat weight, the coefficient of determination was 0.62; the correlation coefficient was 0.79. The equations for predicting abdominal fat weight in laying hens, based on external measurements of the birds, showed positive coefficients of determination and correlation coefficients, thus allowing researchers to determine abdominal fat weight in vivo.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
DEFF Research Database (Denmark)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.
2017-01-01
functions for the optimization of regression coefficients for the scale parameter. These three refinements are tested for 10 stations in a small area of the European Alps for lead times from +24 to +144 h and accumulation periods of 24 and 6 h. Together, they improve probabilistic forecasts...... to obtain automatically corrected weather forecasts. This study applies the nonhomogenous regression framework as a state-of-the-art ensemble postprocessing technique to predict a full forecast distribution and improves its forecast performance with three statistical refinements. First of all, a novel split...... for precipitation amounts as well as the probability of precipitation events over the default postprocessing method. The improvements are largest for the shorter accumulation periods and shorter lead times, where the information of unanimous ensemble predictions is more important....
Fuzzy Regression Prediction and Application Based on Multi-Dimensional Factors of Freight Volume
Xiao, Mengting; Li, Cheng
2018-01-01
Based on the reality of the development of air cargo, the multi-dimensional fuzzy regression method is used to determine the influencing factors, and the three most important influencing factors of GDP, total fixed assets investment and regular flight route mileage are determined. The system’s viewpoints and analogy methods, the use of fuzzy numbers and multiple regression methods to predict the civil aviation cargo volume. In comparison with the 13th Five-Year Plan for China’s Civil Aviation Development (2016-2020), it is proved that this method can effectively improve the accuracy of forecasting and reduce the risk of forecasting. It is proved that this model predicts civil aviation freight volume of the feasibility, has a high practical significance and practical operation.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
ENHANCED PREDICTION OF STUDENT DROPOUTS USING FUZZY INFERENCE SYSTEM AND LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
A. Saranya
2016-01-01
Full Text Available Predicting college and school dropouts is a major problem in educational system and has complicated challenge due to data imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout students using neural based classification algorithm and different mining technique are implemented for data processing. We also propose a Dropout Prediction Algorithm (DPA using fuzzy logic and Logistic Regression based inference system because the weighted average will improve the performance of whole system. We are experimented our proposed work with all other classification systems and documented as the best outcomes. The aggregated data is given to the decision trees for better dropout prediction. The accuracy of overall system 98.6% it shows the proposed work depicts efficient prediction.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY
Chayalakshmi C.L
2018-01-01
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Blood glucose level prediction based on support vector regression using mobile platforms.
Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M
2016-08-01
The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.
Using support vector regression to predict PM10 and PM2.5
International Nuclear Information System (INIS)
Weizhen, Hou; Zhengqiang, Li; Yuhuan, Zhang; Hua, Xu; Ying, Zhang; Kaitao, Li; Donghui, Li; Peng, Wei; Yan, Ma
2014-01-01
Support vector machine (SVM), as a novel and powerful machine learning tool, can be used for the prediction of PM 10 and PM 2.5 (particulate matter less or equal than 10 and 2.5 micrometer) in the atmosphere. This paper describes the development of a successive over relaxation support vector regress (SOR-SVR) model for the PM 10 and PM 2.5 prediction, based on the daily average aerosol optical depth (AOD) and meteorological parameters (atmospheric pressure, relative humidity, air temperature, wind speed), which were all measured in Beijing during the year of 2010–2012. The Gaussian kernel function, as well as the k-fold crosses validation and grid search method, are used in SVR model to obtain the optimal parameters to get a better generalization capability. The result shows that predicted values by the SOR-SVR model agree well with the actual data and have a good generalization ability to predict PM 10 and PM 2.5 . In addition, AOD plays an important role in predicting particulate matter with SVR model, which should be included in the prediction model. If only considering the meteorological parameters and eliminating AOD from the SVR model, the prediction results of predict particulate matter will be not satisfying
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
Directory of Open Access Journals (Sweden)
Lassi Rieppo
Full Text Available Fourier Transform Infrared (FT-IR spectroscopic imaging has been earlier applied for the spatial estimation of the collagen and the proteoglycan (PG contents of articular cartilage (AC. However, earlier studies have been limited to the use of univariate analysis techniques. Current analysis methods lack the needed specificity for collagen and PGs. The aim of the present study was to evaluate the suitability of partial least squares regression (PLSR and principal component regression (PCR methods for the analysis of the PG content of AC. Multivariate regression models were compared with earlier used univariate methods and tested with a sample material consisting of healthy and enzymatically degraded steer AC. Chondroitinase ABC enzyme was used to increase the variation in PG content levels as compared to intact AC. Digital densitometric measurements of Safranin O-stained sections provided the reference for PG content. The results showed that multivariate regression models predict PG content of AC significantly better than earlier used absorbance spectrum (i.e. the area of carbohydrate region with or without amide I normalization or second derivative spectrum univariate parameters. Increased molecular specificity favours the use of multivariate regression models, but they require more knowledge of chemometric analysis and extended laboratory resources for gathering reference data for establishing the models. When true molecular specificity is required, the multivariate models should be used.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki
2017-05-01
The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Ridge regression for predicting elastic moduli and hardness of calcium aluminosilicate glasses
Deng, Yifan; Zeng, Huidan; Jiang, Yejia; Chen, Guorong; Chen, Jianding; Sun, Luyi
2018-03-01
It is of great significance to design glasses with satisfactory mechanical properties predictively through modeling. Among various modeling methods, data-driven modeling is such a reliable approach that can dramatically shorten research duration, cut research cost and accelerate the development of glass materials. In this work, the ridge regression (RR) analysis was used to construct regression models for predicting the compositional dependence of CaO-Al2O3-SiO2 glass elastic moduli (Shear, Bulk, and Young’s moduli) and hardness based on the ternary diagram of the compositions. The property prediction over a large glass composition space was accomplished with known experimental data of various compositions in the literature, and the simulated results are in good agreement with the measured ones. This regression model can serve as a facile and effective tool for studying the relationship between the compositions and the property, enabling high-efficient design of glasses to meet the requirements for specific elasticity and hardness.
Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils
Directory of Open Access Journals (Sweden)
Fatimah Khaleel Ibrahim
2017-08-01
Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparison of equations for predicting energy expenditure from accelerometer counts in children
DEFF Research Database (Denmark)
Nilsson, A; Brage, S; Riddoch, C
2008-01-01
calorimeter-based (CAL) equation (mixture of activities). Predicted physical activity energy expenditure (PAEE) was the main outcome variable. In comparison with DLW-predicted PAEE, both laboratory-derived equations significantly (PPAEE by 17% and 83%, respectively, when based on a 24-h...... prediction, while the TM equation significantly (PPAEE by 46%, when based on awake time only. In contrast, the CAL equation agreed better with the DLW equation under the awake time assumption. Predicted PAEE differ substantially between equations, depending on time-frame assumptions......, and interpretations of average levels of PAEE in children from available equations should be made with caution. Further development of equations applicable to free-living scenarios is needed....
Genome-wide prediction of discrete traits using bayesian regressions and machine learning
Directory of Open Access Journals (Sweden)
Forni Selma
2011-02-01
Full Text Available Abstract Background Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates small n (number of observations problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance. It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO and two machine learning algorithms (boosting and random forest to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. Results The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different
A review of a priori regression models for warfarin maintenance dose prediction.
Directory of Open Access Journals (Sweden)
Ben Francis
Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
A review of a priori regression models for warfarin maintenance dose prediction.
Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea
2014-01-01
A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression
Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.
2010-01-01
In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Directory of Open Access Journals (Sweden)
Bita Najafian
2015-02-01
Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS.Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version.Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method.Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Directory of Open Access Journals (Sweden)
Bita Najafian
2015-02-01
Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS. Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version. Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method. Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Cunha, B C N; Belk, K E; Scanga, J A; LeValley, S B; Tatum, J D; Smith, G C
2004-07-01
This study was performed to validate previous equations and to develop and evaluate new regression equations for predicting lamb carcass fabrication yields using outputs from a lamb vision system-hot carcass component (LVS-HCC) and the lamb vision system-chilled carcass LM imaging component (LVS-CCC). Lamb carcasses (n = 149) were selected after slaughter, imaged hot using the LVS-HCC, and chilled for 24 to 48 h at -3 to 1 degrees C. Chilled carcasses yield grades (YG) were assigned on-line by USDA graders and by expert USDA grading supervisors with unlimited time and access to the carcasses. Before fabrication, carcasses were ribbed between the 12th and 13th ribs and imaged using the LVS-CCC. Carcasses were fabricated into bone-in subprimal/primal cuts. Yields calculated included 1) saleable meat yield (SMY); 2) subprimal yield (SPY); and 3) fat yield (FY). On-line (whole-number) USDA YG accounted for 59, 58, and 64%; expert (whole-number) USDA YG explained 59, 59, and 65%; and expert (nearest-tenth) USDA YG accounted for 60, 60, and 67% of the observed variation in SMY, SPY, and FY, respectively. The best prediction equation developed in this trial using LVS-HCC output and hot carcass weight as independent variables explained 68, 62, and 74% of the variation in SMY, SPY, and FY, respectively. Addition of output from LVS-CCC improved predictive accuracy of the equations; the combined output equations explained 72 and 66% of the variability in SMY and SPY, respectively. Accuracy and repeatability of measurement of LM area made with the LVS-CCC also was assessed, and results suggested that use of LVS-CCC provided reasonably accurate (R2 = 0.59) and highly repeatable (repeatability = 0.98) measurements of LM area. Compared with USDA YG, use of the dual-component lamb vision system to predict cut yields of lamb carcasses improved accuracy and precision, suggesting that this system could have an application as an objective means for pricing carcasses in a value
Directory of Open Access Journals (Sweden)
Victor Aredo
2017-01-01
Full Text Available The aim of this study was to build a model to predict the beef marbling using HSI and Partial Least Squares Regression (PLSR. Totally 58 samples of longissmus dorsi muscle were scanned by a HSI system (400 - 1000 nm in reflectance mode, using 44 samples to build t he PLSR model and 14 samples to model validation. The Japanese Beef Marbling Standard (BMS was used as reference by 15 middle - trained judges for the samples evaluation. The scores were assigned as continuous values and varied from 1.2 to 5.3 BMS. The PLSR model showed a high correlation coefficient in the prediction (r = 0.95, a low Standard Error of Calibration (SEC of 0.2 BMS score, and a low Standard Error of Prediction (SEP of 0.3 BMS score.
Support vector regression model based predictive control of water level of U-tube steam generators
Energy Technology Data Exchange (ETDEWEB)
Kavaklioglu, Kadir, E-mail: kadir.kavaklioglu@pau.edu.tr
2014-10-15
Highlights: • Water level of U-tube steam generators was controlled in a model predictive fashion. • Models for steam generator water level were built using support vector regression. • Cost function minimization for future optimal controls was performed by using the steepest descent method. • The results indicated the feasibility of the proposed method. - Abstract: A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method.
Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.
Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack
2017-06-01
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.
Directory of Open Access Journals (Sweden)
Marjan Čeh
2018-05-01
Full Text Available The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008–2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1 the non-linear nature of the prediction assignment task; (2 input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3 the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R2 values, sales ratios, mean average percentage error (MAPE, coefficient of dispersion (COD revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-12-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.
2017-06-01
The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
van Veen, S H C M; van Kleef, R C; van de Ven, W P M M; van Vliet, R C J A
2018-02-01
This study explores the predictive power of interaction terms between the risk adjusters in the Dutch risk equalization (RE) model of 2014. Due to the sophistication of this RE-model and the complexity of the associations in the dataset (N = ~16.7 million), there are theoretically more than a million interaction terms. We used regression tree modelling, which has been applied rarely within the field of RE, to identify interaction terms that statistically significantly explain variation in observed expenses that is not already explained by the risk adjusters in this RE-model. The interaction terms identified were used as additional risk adjusters in the RE-model. We found evidence that interaction terms can improve the prediction of expenses overall and for specific groups in the population. However, the prediction of expenses for some other selective groups may deteriorate. Thus, interactions can reduce financial incentives for risk selection for some groups but may increase them for others. Furthermore, because regression trees are not robust, additional criteria are needed to decide which interaction terms should be used in practice. These criteria could be the right incentive structure for risk selection and efficiency or the opinion of medical experts. Copyright © 2017 John Wiley & Sons, Ltd.
Prediction of hourly PM2.5 using a space-time support vector regression model
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O
2016-06-01
The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Directory of Open Access Journals (Sweden)
Guan Lian
2018-01-01
Full Text Available Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.
Efficient Prediction of Low-Visibility Events at Airports Using Machine-Learning Regression
Cornejo-Bueno, L.; Casanova-Mateo, C.; Sanz-Justo, J.; Cerro-Prada, E.; Salcedo-Sanz, S.
2017-11-01
We address the prediction of low-visibility events at airports using machine-learning regression. The proposed model successfully forecasts low-visibility events in terms of the runway visual range at the airport, with the use of support-vector regression, neural networks (multi-layer perceptrons and extreme-learning machines) and Gaussian-process algorithms. We assess the performance of these algorithms based on real data collected at the Valladolid airport, Spain. We also propose a study of the atmospheric variables measured at a nearby tower related to low-visibility atmospheric conditions, since they are considered as the inputs of the different regressors. A pre-processing procedure of these input variables with wavelet transforms is also described. The results show that the proposed machine-learning algorithms are able to predict low-visibility events well. The Gaussian process is the best algorithm among those analyzed, obtaining over 98% of the correct classification rate in low-visibility events when the runway visual range is {>}1000 m, and about 80% under this threshold. The performance of all the machine-learning algorithms tested is clearly affected in extreme low-visibility conditions ({algorithm performance in daytime and nighttime conditions, and for different prediction time horizons.
Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E
2017-06-14
Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G
2013-12-01
To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Directory of Open Access Journals (Sweden)
BUDIMAN
2012-01-01
Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.
Support vector regression to predict porosity and permeability: Effect of sample size
Al-Anazi, A. F.; Gates, I. D.
2012-02-01
Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (pmachine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling
Directory of Open Access Journals (Sweden)
Issaku Kawashima
2017-07-01
Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Predictive densities for day-ahead electricity prices using time-adaptive quantile regression
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik
2014-01-01
A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model...
Predictive based monitoring of nuclear plant component degradation using support vector regression
International Nuclear Information System (INIS)
Agarwal, Vivek; Alamaniotis, Miltiadis; Tsoukalas, Lefteri H.
2015-01-01
Nuclear power plants (NPPs) are large installations comprised of many active and passive assets. Degradation monitoring of all these assets is expensive (labor cost) and highly demanding task. In this paper a framework based on Support Vector Regression (SVR) for online surveillance of critical parameter degradation of NPP components is proposed. In this case, on time replacement or maintenance of components will prevent potential plant malfunctions, and reduce the overall operational cost. In the current work, we apply SVR equipped with a Gaussian kernel function to monitor components. Monitoring includes the one-step-ahead prediction of the component's respective operational quantity using the SVR model, while the SVR model is trained using a set of previous recorded degradation histories of similar components. Predictive capability of the model is evaluated upon arrival of a sensor measurement, which is compared to the component failure threshold. A maintenance decision is based on a fuzzy inference system that utilizes three parameters: (i) prediction evaluation in the previous steps, (ii) predicted value of the current step, (iii) and difference of current predicted value with components failure thresholds. The proposed framework will be tested on turbine blade degradation data.
Model-free prediction and regression a transformation-based approach to inference
Politis, Dimitris N
2015-01-01
The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).
Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
Liu, Xian; Engel, Charles C
2012-12-20
Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.
Xie, Yang; Schreier, Günter; Chang, David C W; Neubauer, Sandra; Redmond, Stephen J; Lovell, Nigel H
2014-01-01
Healthcare administrators worldwide are striving to both lower the cost of care whilst improving the quality of care given. Therefore, better clinical and administrative decision making is needed to improve these issues. Anticipating outcomes such as number of hospitalization days could contribute to addressing this problem. In this paper, a method was developed, using large-scale health insurance claims data, to predict the number of hospitalization days in a population. We utilized a regression decision tree algorithm, along with insurance claim data from 300,000 individuals over three years, to provide predictions of number of days in hospital in the third year, based on medical admissions and claims data from the first two years. Our method performs well in the general population. For the population aged 65 years and over, the predictive model significantly improves predictions over a baseline method (predicting a constant number of days for each patient), and achieved a specificity of 70.20% and sensitivity of 75.69% in classifying these subjects into two categories of 'no hospitalization' and 'at least one day in hospital'.
Tian, Y.; Xu, Y. P.
2017-12-01
In this paper, the Support Vector Regression (SVR) model incorporating climate indices and drought indices are developed to predict agriculture drought in Xiangjiang River basin, Central China. The agriculture droughts are presented with the Precipitation-Evapotranspiration Index (SPEI). According to the analysis of the relationship between SPEI with different time scales and soil moisture, it is found that SPEI of six months time scales (SPEI-6) could reflect the soil moisture better than that of three and one month time scale from the drought features including drought duration, severity and peak. Climate forcing like El Niño Southern Oscillation and western Pacific subtropical high (WPSH) are represented by climate indices such as MEI and series indices of WPSH. Ridge Point of WPSH is found to be the key factor that influences the agriculture drought mainly through the control of temperature. Based on the climate indices analysis, the predictions of SPEI-6 are conducted using the SVR model. The results show that the SVR model incorperating climate indices, especially ridge point of WPSH, could improve the prediction accuracy compared to that using drought index only. The improvement was more significant for the prediction of one month lead time than that of three months lead time. However, it needs to be cautious in selection of the input parameters, since adding more useless information could have a counter effect in attaining a better prediction.
DEFF Research Database (Denmark)
Marcussen, Lis; Aasberg-Petersen, K.; Krøll, Annette Elisabeth
2000-01-01
An adsorption isotherm equation for nonideal pure component adsorption based on vacancy solution theory and the Non-Random-Two-Liquid (NRTL) equation is found to be useful for predicting pure component adsorption equilibria at a variety of conditions. The isotherm equation is evaluated successfully...... adsorption systems, spreading pressure and isosteric heat of adsorption are also calculated....
Directory of Open Access Journals (Sweden)
M Taki
2017-05-01
. To measure the temperature and the relative humidity of the air, soil and roof inside and outside the greenhouse, the SHT 11 sensors were used. The accuracy of the measurement of temperature was ±0.4% at 20 °C and the precision measurement of the moisture was ±3% for a clear sky. We used these sensors in soil, on the roof (inside greenhouse and in the air of greenhouse and outside to measure the temperature and relative humidity. At a 1 m height above the ground outside the greenhouse, we used a pyranometre type TES 1333. Its sensitivity was proportional to the cosine of the incidence angle of the radiation. It is a measure of global radiation of the spectral band solar in the 400–1110 nm. Its measurement accuracy was approximately ±5%. Some heat transfer models used to predict the inside and roof temperature are according to equation (1 and (5: Results and Discussion Results showed that solar radiation on the roof of semi-solar greenhouse was higher after noon so this shape can receive high amounts of solar energy during a day. From statistical point of view, both desired and predicted test data have been analyzed to determine whether there are statistically significant differences between them. The null hypothesis assumes that statistical parameters of both series are equal. P value was used to check each hypothesis. Its threshold value was 0.05. If p value is greater than the threshold, the null hypothesis is then fulfilled. To check the differences between the data series, different tests were performed and p value was calculated for each case. The so called t-test was used to compare the means of both series. It was also assumed that the variance of both samples could be considered equal. The variance was analyzed using the F-test. Here, a normal distribution of samples was assumed. The results showed that the p values for heat model in all 2 statistical factors (Comparison of means, and variance is lower than regression model and so the heat model did not
Directory of Open Access Journals (Sweden)
Esperanza García-Gonzalo
2016-01-01
Full Text Available Predicting paper properties based on a limited number of measured variables can be an important tool for the industry. Mathematical models were developed to predict mechanical and optical properties from the corresponding paper density for some softwood papers using support vector machine regression with the Radial Basis Function Kernel. A dataset of different properties of paper handsheets produced from pulps of pine (Pinus pinaster and P. sylvestris and cypress species (Cupressus lusitanica, C. sempervirens, and C. arizonica beaten at 1000, 4000, and 7000 revolutions was used. The results show that it is possible to obtain good models (with high coefficient of determination with two variables: the numerical variable density and the categorical variable species.
Stone, Wesley W.; Gilliom, Robert J.
2011-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, can be improved for application to the U.S. Corn Belt region by developing region-specific models that include important watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for predicting annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. All streams used in development of WARP-CB models drain watersheds with atrazine use intensity greater than 17 kilograms per square kilometer (kg/km2). The WARP-CB models accounted for 53 to 62 percent of the variability in the various concentration statistics among the model-development sites.
FUZZY REGRESSION MODEL TO PREDICT THE BEAD GEOMETRY IN THE ROBOTIC WELDING PROCESS
Institute of Scientific and Technical Information of China (English)
B.S. Sung; I.S. Kim; Y. Xue; H.H. Kim; Y.H. Cha
2007-01-01
Recently, there has been a rapid development in computer technology, which has in turn led todevelop the fully robotic welding system using artificial intelligence (AI) technology. However, therobotic welding system has not been achieved due to difficulties of the mathematical model andsensor technologies. The possibilities of the fuzzy regression method to predict the bead geometry,such as bead width, bead height, bead penetration and bead area in the robotic GMA (gas metalarc) welding process is presented. The approach, a well-known method to deal with the problemswith a high degree of fuzziness, is used to build the relationship between four process variablesand the four quality characteristics, respectively. Using these models, the proper prediction of theprocess variables for obtaining the optimal bead geometry can be determined.
Directory of Open Access Journals (Sweden)
Avval Zhila Mohajeri
2015-01-01
Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
Ding, Bo; Fang, Huajing
2017-05-01
This paper is concerned with the fault prediction for the nonlinear stochastic system with incipient faults. Based on the particle filter and the reasonable assumption about the incipient faults, the modified fault estimation algorithm is proposed, and the system state is estimated simultaneously. According to the modified fault estimation, an intuitive fault detection strategy is introduced. Once each of the incipient fault is detected, the parameters of which are identified by a nonlinear regression method. Then, based on the estimated parameters, the future fault signal can be predicted. Finally, the effectiveness of the proposed method is verified by the simulations of the Three-tank system. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Asencio-Cortés, G.; Morales-Esteban, A.; Shang, X.; Martínez-Álvarez, F.
2018-06-01
Earthquake magnitude prediction is a challenging problem that has been widely studied during the last decades. Statistical, geophysical and machine learning approaches can be found in literature, with no particularly satisfactory results. In recent years, powerful computational techniques to analyze big data have emerged, making possible the analysis of massive datasets. These new methods make use of physical resources like cloud based architectures. California is known for being one of the regions with highest seismic activity in the world and many data are available. In this work, the use of several regression algorithms combined with ensemble learning is explored in the context of big data (1 GB catalog is used), in order to predict earthquakes magnitude within the next seven days. Apache Spark framework, H2 O library in R language and Amazon cloud infrastructure were been used, reporting very promising results.
DeHart, Russell
2017-01-01
This study determines the feasibility of creating a tool that can accurately predict Lunar Reconnaissance Orbiter (LRO) reaction wheel assembly (RWA) angular momentum, weeks or even months into the future. LRO is a three-axis stabilized spacecraft that was launched on June 18, 2009. While typically nadir-pointing, LRO conducts many types of slews to enable novel science collection. Momentum unloads have historically been performed approximately once every two weeks with the goal of maintaining system total angular momentum below 70 Nms; however flight experience shows the models developed before launch are overly conservative, with many momentum unloads being performed before system angular momentum surpasses 50 Nms. A more accurate model of RWA angular momentum growth would improve momentum unload scheduling and decrease the frequency of these unloads. Since some LRO instruments must be deactivated during momentum unloads and in the case of one instrument, decontaminated for 24 hours there after a decrease in the frequency of unloads increases science collection. This study develops a new model to predict LRO RWA angular momentum. Regression analysis of data from October 2014 to October 2015 was used to develop relationships between solar beta angle, slew specifications, and RWA angular momentum growth. The resulting model predicts RWA angular momentum using input solar beta angle and mission schedule data. This model was used to predict RWA angular momentum from October 2013 to October 2014. Predictions agree well with telemetry; of the 23 momentum unloads performed from October 2013 to October 2014, the mean and median magnitude of the RWA total angular momentum prediction error at the time of the momentum unloads were 3.7 and 2.7 Nms, respectively. The magnitude of the largest RWA total angular momentum prediction error was 10.6 Nms. Development of a tool that uses the models presented herein is currently underway.
Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L
2012-08-07
Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in
Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari
2018-01-01
Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg
Correlation and prediction equations for eight-week bodyweight in ...
African Journals Online (AJOL)
Journal Home · ABOUT THIS JOURNAL · Advanced Search · Current Issue · Archives ... Cubic; Compound; Power; Sigmoidal; Growth; and Exponential equation. ... logarithmic, inverse, compound, growth and exponential) have significant ...
Energy Technology Data Exchange (ETDEWEB)
Atsumi, Kazushige [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Shioyama, Yoshiyuki, E-mail: shioyama@radiol.med.kyushu-u.ac.jp [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Arimura, Hidetaka [Department of Health Sciences, Kyushu University, Fukuoka (Japan); Terashima, Kotaro [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Matsuki, Takaomi [Department of Health Sciences, Kyushu University, Fukuoka (Japan); Ohga, Saiji; Yoshitake, Tadamasa; Nonoshita, Takeshi; Tsurumaru, Daisuke; Ohnishi, Kayoko; Asai, Kaori; Matsumoto, Keiji [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Nakamura, Katsumasa [Department of Radiology, Kyushu University Hospital at Beppu, Oita (Japan); Honda, Hiroshi [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan)
2012-04-01
Purpose: To determine clinical factors for predicting the frequency and severity of esophageal stenosis associated with tumor regression in radiotherapy for esophageal cancer. Methods and Materials: The study group consisted of 109 patients with esophageal cancer of T1-4 and Stage I-III who were treated with definitive radiotherapy and achieved a complete response of their primary lesion at Kyushu University Hospital between January 1998 and December 2007. Esophageal stenosis was evaluated using esophagographic images within 3 months after completion of radiotherapy. We investigated the correlation between esophageal stenosis after radiotherapy and each of the clinical factors with regard to tumors and therapy. For validation of the correlative factors for esophageal stenosis, an artificial neural network was used to predict the esophageal stenotic ratio. Results: Esophageal stenosis tended to be more severe and more frequent in T3-4 cases than in T1-2 cases. Esophageal stenosis in cases with full circumference involvement tended to be more severe and more frequent than that in cases without full circumference involvement. Increases in wall thickness tended to be associated with increases in esophageal stenosis severity and frequency. In the multivariate analysis, T stage, extent of involved circumference, and wall thickness of the tumor region were significantly correlated to esophageal stenosis (p = 0.031, p < 0.0001, and p = 0.0011, respectively). The esophageal stenotic ratio predicted by the artificial neural network, which learned these three factors, was significantly correlated to the actual observed stenotic ratio, with a correlation coefficient of 0.864 (p < 0.001). Conclusion: Our study suggested that T stage, extent of involved circumference, and esophageal wall thickness of the tumor region were useful to predict the frequency and severity of esophageal stenosis associated with tumor regression in radiotherapy for esophageal cancer.
International Nuclear Information System (INIS)
Silvia Dulanska; Lubomir Matel; Milan Meloun
2010-01-01
The decommissioning of the nuclear power plant (NPP) A1 Jaslovske Bohunice (Slovakia) is a complicated set of problems that is highly demanding both technically and financially. The basic goal of the decommissioning process is the total elimination of radioactive materials from the nuclear power plant area, and radwaste treatment to a form suitable for its safe disposal. The initial conditions of decommissioning also include elimination of the operational events, preparation and transport of the fuel from the plant territory, radiochemical and physical-chemical characterization of the radioactive wastes. One of the problems was and still is the processing of the liquid radioactive wastes. Such media is also the cooling water of the long-term storage of spent fuel. A suitable scaling model for predicting the activity of hard-to-detect radionuclides 239,240 Pu, 90 Sr and summary beta in cooling water using a regression triplet technique has been built using the regression triplet analysis and regression diagnostics. (author)
Yang, J.; Astitha, M.; Anagnostou, E. N.; Hartman, B.; Kallos, G. B.
2015-12-01
Weather prediction accuracy has become very important for the Northeast U.S. given the devastating effects of extreme weather events in the recent years. Weather forecasting systems are used towards building strategies to prevent catastrophic losses for human lives and the environment. Concurrently, weather forecast tools and techniques have evolved with improved forecast skill as numerical prediction techniques are strengthened by increased super-computing resources. In this study, we examine the combination of two state-of-the-science atmospheric models (WRF and RAMS/ICLAMS) by utilizing a Bayesian regression approach to improve the prediction of extreme weather events for NE U.S. The basic concept behind the Bayesian regression approach is to take advantage of the strengths of two atmospheric modeling systems and, similar to the multi-model ensemble approach, limit their weaknesses which are related to systematic and random errors in the numerical prediction of physical processes. The first part of this study is focused on retrospective simulations of seventeen storms that affected the region in the period 2004-2013. Optimal variances are estimated by minimizing the root mean square error and are applied to out-of-sample weather events. The applicability and usefulness of this approach are demonstrated by conducting an error analysis based on in-situ observations from meteorological stations of the National Weather Service (NWS) for wind speed and wind direction, and NCEP Stage IV radar data, mosaicked from the regional multi-sensor for precipitation. The preliminary results indicate a significant improvement in the statistical metrics of the modeled-observed pairs for meteorological variables using various combinations of the sixteen events as predictors of the seventeenth. This presentation will illustrate the implemented methodology and the obtained results for wind speed, wind direction and precipitation, as well as set the research steps that will be
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Regression tree analysis for predicting body weight of Nigerian Muscovy duck (Cairina moschata
Directory of Open Access Journals (Sweden)
Oguntunji Abel Olusegun
2017-01-01
Full Text Available Morphometric parameters and their indices are central to the understanding of the type and function of livestock. The present study was conducted to predict body weight (BWT of adult Nigerian Muscovy ducks from nine (9 morphometric parameters and seven (7 body indices and also to identify the most important predictor of BWT among them using regression tree analysis (RTA. The experimental birds comprised of 1,020 adult male and female Nigerian Muscovy ducks randomly sampled in Rain Forest (203, Guinea Savanna (298 and Derived Savanna (519 agro-ecological zones. Result of RTA revealed that compactness; body girth and massiveness were the most important independent variables in predicting BWT and were used in constructing RT. The combined effect of the three predictors was very high and explained 91.00% of the observed variation of the target variable (BWT. The optimal regression tree suggested that Muscovy ducks with compactness >5.765 would be fleshy and have highest BWT. The result of the present study could be exploited by animal breeders and breeding companies in selection and improvement of BWT of Muscovy ducks.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
Energy Technology Data Exchange (ETDEWEB)
Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.
2017-11-01
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
International Nuclear Information System (INIS)
Gupta, N
2008-01-01
3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO 2 density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression
Directory of Open Access Journals (Sweden)
Chunxiao Zhang
2012-01-01
Full Text Available The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.
Groundwater level prediction of landslide based on classification and regression tree
Directory of Open Access Journals (Sweden)
Yannan Zhao
2016-09-01
Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
International Nuclear Information System (INIS)
Lins, Isis Didier; Droguett, Enrique López; Moura, Márcio das Chagas; Zio, Enrico; Jacinto, Carlos Magno
2015-01-01
Data-driven learning methods for predicting the evolution of the degradation processes affecting equipment are becoming increasingly attractive in reliability and prognostics applications. Among these, we consider here Support Vector Regression (SVR), which has provided promising results in various applications. Nevertheless, the predictions provided by SVR are point estimates whereas in order to take better informed decisions, an uncertainty assessment should be also carried out. For this, we apply bootstrap to SVR so as to obtain confidence and prediction intervals, without having to make any assumption about probability distributions and with good performance even when only a small data set is available. The bootstrapped SVR is first verified on Monte Carlo experiments and then is applied to a real case study concerning the prediction of degradation of a component from the offshore oil industry. The results obtained indicate that the bootstrapped SVR is a promising tool for providing reliable point and interval estimates, which can inform maintenance-related decisions on degrading components. - Highlights: • Bootstrap (pairs/residuals) and SVR are used as an uncertainty analysis framework. • Numerical experiments are performed to assess accuracy and coverage properties. • More bootstrap replications does not significantly improve performance. • Degradation of equipment of offshore oil wells is estimated by bootstrapped SVR. • Estimates about the scale growth rate can support maintenance-related decisions
Directory of Open Access Journals (Sweden)
Mustakim Mustakim
2016-02-01
Full Text Available The largest region that produces oil palm in Indonesia has an important role in improving the welfare of society and economy. Oil palm has increased significantly in Riau Province in every period, to determine the production development for the next few years with the functions and benefits of oil palm carried prediction production results that were seen from time series data last 8 years (2005-2013. In its prediction implementation, it was done by comparing the performance of Support Vector Regression (SVR method and Artificial Neural Network (ANN. From the experiment, SVR produced the best model compared with ANN. It is indicated by the correlation coefficient of 95% and 6% for MSE in the kernel Radial Basis Function (RBF, whereas ANN produced only 74% for R2 and 9% for MSE on the 8th experiment with hiden neuron 20 and learning rate 0,1. SVR model generates predictions for next 3 years which increased between 3% - 6% from actual data and RBF model predictions.
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Snedden, Gregg A.; Steyer, Gregory D.
2013-02-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data
Directory of Open Access Journals (Sweden)
Ruzzo Walter L
2006-03-01
Full Text Available Abstract Background As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. Results We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. Conclusion Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.
Nusdwinuringtyas, Nury; Widjajalaksmi; Yunus, Faisal; Alwi, Idrus
2014-04-01
to develop a reference equation for prediction of the total distance walk using Indonesian anthropometrics of sedentary healthy subjects. Subsequently, the prediction obtained was compared to those calculated by the Caucasian-based Enright prediction equation. the cross-sectional study was conducted among 123 healthy Indonesian adults with sedentary life style (58 male and 65 female subjects in an age range between 18 and 50 years). Heart rate was recorded using Polar with expectation in the sub-maximal zone (120-170 beats per minute). The subjects performed two six-minute walk tests, the first one on a 15-meter track according to the protocol developed by the investigator. The second walk was carried out on Biodex®gait trainer as gold standard. an average total distance of 547±54.24 m was found, not significantly different from the gold standard of 544.72±54.11 m (p>0.05). Multiple regression analysis was performed to develop the new equation. the reference equation for prediction of the total distance using Indonesian anthropometrics is more applicable in Indonesia.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Validity of predictive equations for basal metabolic rate in Japanese adults.
Miyake, Rieko; Tanaka, Shigeho; Ohkawara, Kazunori; Ishikawa-Takata, Kazuko; Hikihara, Yuki; Taguri, Emiko; Kayashita, Jun; Tabata, Izumi
2011-01-01
Many predictive equations for basal metabolic rate (BMR) based on anthropometric measurements, age, and sex have been developed, mainly for healthy Caucasians. However, it has been reported that many of these equations, used widely, overestimate BMR not only for Asians, but also for Caucasians. The present study examined the accuracy of several predictive equations for BMR in Japanese subjects. In 365 healthy Japanese male and female subjects, aged 18 to 79 y, BMR was measured in the post-absorptive state using a mask and Douglas bag. Six predictive equations were examined. Total error was used as an index of the accuracy of each equation's prediction. Predicted BMR values by Dietary Reference Intakes for Japanese (Japan-DRI), Adjusted Dietary Reference Intakes for Japanese (Adjusted-DRI), and Ganpule equations were not significantly different from the measured BMR in either sex. On the other hand, Harris-Benedict, Schofield, and Food and Agriculture Organization of the United Nations/World Health Organization/United Nations University equations were significantly higher than the measured BMR in both sexes. The prediction error by Japan-DRI, Adjusted-DRI, and Harris-Benedict equations was significantly correlated with body weight in both sexes. Total error using the Ganpule equation was low in both males and females (125 and 99 kcal/d, respectively). In addition, total error using the Adjusted-DRI equation was low in females (95 kcal/d). Thus, the Ganpule equation was the most accurate in predicting BMR in our healthy Japanese subjects, because the difference between the predicted and measured BMR was relatively small, and body weight had no effect on the prediction error.
ten Haaf, Twan; Weijs, Peter J M
2014-01-01
Resting energy expenditure (REE) is expected to be higher in athletes because of their relatively high fat free mass (FFM). Therefore, REE predictive equation for recreational athletes may be required. The aim of this study was to validate existing REE predictive equations and to develop a new recreational athlete specific equation. 90 (53 M, 37 F) adult athletes, exercising on average 9.1 ± 5.0 hours a week and 5.0 ± 1.8 times a week, were included. REE was measured using indirect calorimetry (Vmax Encore n29), FFM and FM were measured using air displacement plethysmography. Multiple linear regression analysis was used to develop a new FFM-based and weight-based REE predictive equation. The percentage accurate predictions (within 10% of measured REE), percentage bias, root mean square error and limits of agreement were calculated. Results: The Cunningham equation and the new weight-based equation REE(kJ / d) = 49.940* weight(kg) + 2459.053* height(m) - 34.014* age(y) + 799.257* sex(M = 1,F = 0) + 122.502 and the new FFM-based equation REE(kJ / d) = 95.272*FFM(kg) + 2026.161 performed equally well. De Lorenzo's equation predicted REE less accurate, but better than the other generally used REE predictive equations. Harris-Benedict, WHO, Schofield, Mifflin and Owen all showed less than 50% accuracy. For a population of (Dutch) recreational athletes, the REE can accurately be predicted with the existing Cunningham equation. Since body composition measurement is not always possible, and other generally used equations fail, the new weight-based equation is advised for use in sports nutrition.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani
2016-09-14
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.
Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study
Al-Anazi, A. F.; Gates, I. D.
2010-12-01
In wells with limited log and core data, porosity, a fundamental and essential property to characterize reservoirs, is challenging to estimate by conventional statistical methods from offset well log and core data in heterogeneous formations. Beyond simple regression, neural networks have been used to develop more accurate porosity correlations. Unfortunately, neural network-based correlations have limited generalization ability and global correlations for a field are usually less accurate compared to local correlations for a sub-region of the reservoir. In this paper, support vector machines are explored as an intelligent technique to correlate porosity to well log data. Recently, support vector regression (SVR), based on the statistical learning theory, have been proposed as a new intelligence technique for both prediction and classification tasks. The underlying formulation of support vector machines embodies the structural risk minimization (SRM) principle which has been shown to be superior to the traditional empirical risk minimization (ERM) principle employed by conventional neural networks and classical statistical methods. This new formulation uses margin-based loss functions to control model complexity independently of the dimensionality of the input space, and kernel functions to project the estimation problem to a higher dimensional space, which enables the solution of more complex nonlinear problem optimization methods to exist for a globally optimal solution. SRM minimizes an upper bound on the expected risk using a margin-based loss function ( ɛ-insensitivity loss function for regression) in contrast to ERM which minimizes the error on the training data. Unlike classical learning methods, SRM, indexed by margin-based loss function, can also control model complexity independent of dimensionality. The SRM inductive principle is designed for statistical estimation with finite data where the ERM inductive principle provides the optimal solution (the
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui
2017-07-15
New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
Tian, Ye; Xu, Yue-Ping; Wang, Guoqing
2018-05-01
Drought can have a substantial impact on the ecosystem and agriculture of the affected region and does harm to local economy. This study aims to analyze the relation between soil moisture and drought and predict agricultural drought in Xiangjiang River basin. The agriculture droughts are presented with the Precipitation-Evapotranspiration Index (SPEI). The Support Vector Regression (SVR) model incorporating climate indices is developed to predict the agricultural droughts. Analysis of climate forcing including El Niño Southern Oscillation and western Pacific subtropical high (WPSH) are carried out to select climate indices. The results show that SPEI of six months time scales (SPEI-6) represents the soil moisture better than that of three and one month time scale on drought duration, severity and peaks. The key factor that influences the agriculture drought is the Ridge Point of WPSH, which mainly controls regional temperature. The SVR model incorporating climate indices, especially ridge point of WPSH, could improve the prediction accuracy compared to that solely using drought index by 4.4% in training and 5.1% in testing measured by Nash Sutcliffe efficiency coefficient (NSE) for three month lead time. The improvement is more significant for the prediction with one month lead (15.8% in training and 27.0% in testing) than that with three months lead time. However, it needs to be cautious in selection of the input parameters, since adding redundant information could have a counter effect in attaining a better prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Zohreh Razzaghi
2011-07-01
Full Text Available Objectives: Vitamin D deficiency is one of the most important health problems of any society. It is more common in elderly even in those dwelling in rest homes. By now, several studies have been conducted on vitamin D deficiency using current statistical models. In this study, corresponding proportional odds and stereotype regression methods were used to identify threatening factors related to vitamin D deficiency in elderly living in rest homes and comparing them with those who live out of the mentioned places. Methods & Materials: In this case-control study, there were 140 older persons living in rest homes and 140 ones not dwelling in these centers. In the present study, 25(OHD serum level variable and age, sex, body mass index, duration of exposure to sunlight variables were regarded as response and predictive variables to vitamin D deficiency, respectively. The analyses were carried out using corresponding proportional odds and stereotype regression methods and estimating parameters of these two models. Deviation statistics (AIC was used to evaluate and compare the mentioned methods. Stata.9.1 software was elected to conduct the analyses. Results: Average serum level of 25(OHD was 16.10±16.65 ng/ml and 39.62±24.78 ng/ml in individuals living in rest homes and those not living there, respectively (P=0.001. Prevalence of vitamin D deficiency (less than 20 ng/ml was observed in 75% of members of the group consisting of those living in rest homes and 23.78% of members of another group. Using corresponding proportional odds and stereotype regression methods, age, sex, body mass index, duration of exposure to sunlight variables and whether they are member of rest home were fitted. In both models, variables of group and duration of exposure to sunlight were regarded as meaningful (P<0.001. Stereotype regression model included group variable (odd ratio for a group suffering from severe vitamin D deficiency was 42.85, 95%CI:9.93-185.67 and
Directory of Open Access Journals (Sweden)
Po-Ru Loh
Full Text Available A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only. We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.
Stone, Wesley W.; Gilliom, Robert J.
2012-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
Directory of Open Access Journals (Sweden)
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.
Lian, Yao; Ge, Meng; Pan, Xian-Ming
2014-12-19
B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
2015-11-03
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
A Gaussian process regression based hybrid approach for short-term wind speed prediction
International Nuclear Information System (INIS)
Zhang, Chi; Wei, Haikun; Zhao, Xin; Liu, Tianhong; Zhang, Kanjian
2016-01-01
Highlights: • A novel hybrid approach is proposed for short-term wind speed prediction. • This method combines the parametric AR model with the non-parametric GPR model. • The relative importance of different inputs is considered. • Different types of covariance functions are considered and combined. • It can provide both accurate point forecasts and satisfactory prediction intervals. - Abstract: This paper proposes a hybrid model based on autoregressive (AR) model and Gaussian process regression (GPR) for probabilistic wind speed forecasting. In the proposed approach, the AR model is employed to capture the overall structure from wind speed series, and the GPR is adopted to extract the local structure. Additionally, automatic relevance determination (ARD) is used to take into account the relative importance of different inputs, and different types of covariance functions are combined to capture the characteristics of the data. The proposed hybrid model is compared with the persistence model, artificial neural network (ANN), and support vector machine (SVM) for one-step ahead forecasting, using wind speed data collected from three wind farms in China. The forecasting results indicate that the proposed method can not only improve point forecasts compared with other methods, but also generate satisfactory prediction intervals.
International Nuclear Information System (INIS)
Bunyamin, Muhammad Afif; Yap, Keem Siah; Aziz, Nur Liyana Afiqah Abdul; Tiong, Sheih Kiong; Wong, Shen Yuong; Kamal, Md Fauzan
2013-01-01
This paper presents a new approach of gas emission estimation in power generation plant using a hybrid Genetic Algorithm (GA) and Linear Regression (LR) (denoted as GA-LR). The LR is one of the approaches that model the relationship between an output dependant variable, y, with one or more explanatory variables or inputs which denoted as x. It is able to estimate unknown model parameters from inputs data. On the other hand, GA is used to search for the optimal solution until specific criteria is met causing termination. These results include providing good solutions as compared to one optimal solution for complex problems. Thus, GA is widely used as feature selection. By combining the LR and GA (GA-LR), this new technique is able to select the most important input features as well as giving more accurate prediction by minimizing the prediction errors. This new technique is able to produce more consistent of gas emission estimation, which may help in reducing population to the environment. In this paper, the study's interest is focused on nitrous oxides (NOx) prediction. The results of the experiment are encouraging.
Cho, Eunsoo; Capin, Philip; Roberts, Greg; Vaughn, Sharon
2017-07-01
Within multitiered instructional delivery models, progress monitoring is a key mechanism for determining whether a child demonstrates an adequate response to instruction. One measure commonly used to monitor the reading progress of students is oral reading fluency (ORF). This study examined the extent to which ORF slope predicts reading comprehension outcomes for fifth-grade struggling readers ( n = 102) participating in an intensive reading intervention. Quantile regression models showed that ORF slope significantly predicted performance on a sentence-level fluency and comprehension assessment, regardless of the students' reading skills, controlling for initial ORF performance. However, ORF slope was differentially predictive of a passage-level comprehension assessment based on students' reading skills when controlling for initial ORF status. Results showed that ORF explained unique variance for struggling readers whose posttest performance was at the upper quantiles at the end of the reading intervention, but slope was not a significant predictor of passage-level comprehension for students whose reading problems were the most difficult to remediate.
A suspended sediment yield predictive equation for river basins in ...
African Journals Online (AJOL)
The fit was found to be better than those relating mean annual specific suspended sediment yield to basin area or runoff only. Since many stream gauging stations in the country have no records on fluvial sediment, the empirical equation can be used to obtain preliminary estimates of expected sediment load of streams for ...
Predictive equations for spirometric reference values in a healthy ...
African Journals Online (AJOL)
men and 98 women were selected to the reference value group. ... adult population in Dar es Salaam, Tanzania, and compare these equations to already ... which has proven to be suitable in field work as it operates on batteries and requires no ..... thought to be partly due to difference in body build and that Blacks have ...
Regression and artificial neural network modeling for the prediction of gray leaf spot of maize.
Paul, P A; Munkvold, G P
2005-04-01
ABSTRACT Regression and artificial neural network (ANN) modeling approaches were combined to develop models to predict the severity of gray leaf spot of maize, caused by Cercospora zeae-maydis. In all, 329 cases consisting of environmental, cultural, and location-specific variables were collected for field plots in Iowa between 1998 and 2002. Disease severity on the ear leaf at the dough to dent plant growth stage was used as the response variable. Correlation and regression analyses were performed to select potentially useful predictor variables. Predictors from the best 9 of 80 regression models were used to develop ANN models. A random sample of 60% of the cases was used to train the networks, and 20% each for testing and validation. Model performance was evaluated based on coefficient of determination (R(2)) and mean square error (MSE) for the validation data set. The best models had R(2) ranging from 0.70 to 0.75 and MSE ranging from 174.7 to 202.8. The most useful predictor variables were hours of daily temperatures between 22 and 30 degrees C (85.50 to 230.50 h) and hours of nightly relative humidity >/=90% (122 to 330 h) for the period between growth stages V4 and V12, mean nightly temperature (65.26 to 76.56 degrees C) for the period between growth stages V12 and R2, longitude (90.08 to 95.14 degrees W), maize residue on the soil surface (0 to 100%), planting date (in day of the year; 112 to 182), and gray leaf spot resistance rating (2 to 7; based on a 1-to-9 scale, where 1 = most susceptible to 9 = most resistant).
Bayesian binary regression model: an application to in-hospital death after AMI prediction
Directory of Open Access Journals (Sweden)
Aparecida D. P. Souza
2004-08-01
Full Text Available A Bayesian binary regression model is developed to predict death of patients after acute myocardial infarction (AMI. Markov Chain Monte Carlo (MCMC methods are used to make inference and to evaluate Bayesian binary regression models. A model building strategy based on Bayes factor is proposed and aspects of model validation are extensively discussed in the paper, including the posterior distribution for the c-index and the analysis of residuals. Risk assessment, based on variables easily available within minutes of the patients' arrival at the hospital, is very important to decide the course of the treatment. The identified model reveals itself strongly reliable and accurate, with a rate of correct classification of 88% and a concordance index of 83%.Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.
Wills, John M.; Mattsson, Ann E.
2012-02-01
Density functional theory (DFT) provides a formally predictive base for equation of state properties. Available approximations to the exchange/correlation functional provide accurate predictions for many materials in the periodic table. For heavy materials however, DFT calculations, using available functionals, fail to provide quantitative predictions, and often fail to be even qualitative. This deficiency is due both to the lack of the appropriate confinement physics in the exchange/correlation functional and to approximations used to evaluate the underlying equations. In order to assess and develop accurate functionals, it is essential to eliminate all other sources of error. In this talk we describe an efficient first-principles electronic structure method based on the Dirac equation and compare the results obtained with this method with other methods generally used. Implications for high-pressure equation of state of relativistic materials are demonstrated in application to Ce and the light actinides. Sandia National Laboratories is a multi-program laboratory managed andoperated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Genomic prediction based on data from three layer lines using non-linear regression models.
Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L
2014-11-06
Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional
Directory of Open Access Journals (Sweden)
Aguiar Fabio S
2012-08-01
Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with
Failure and reliability prediction by support vector machines regression of time series data
International Nuclear Information System (INIS)
Chagas Moura, Marcio das; Zio, Enrico; Lins, Isis Didier; Droguett, Enrique
2011-01-01
Support Vector Machines (SVMs) are kernel-based learning methods, which have been successfully adopted for regression problems. However, their use in reliability applications has not been widely explored. In this paper, a comparative analysis is presented in order to evaluate the SVM effectiveness in forecasting time-to-failure and reliability of engineered components based on time series data. The performance on literature case studies of SVM regression is measured against other advanced learning methods such as the Radial Basis Function, the traditional MultiLayer Perceptron model, Box-Jenkins autoregressive-integrated-moving average and the Infinite Impulse Response Locally Recurrent Neural Networks. The comparison shows that in the analyzed cases, SVM outperforms or is comparable to other techniques. - Highlights: → Realistic modeling of reliability demands complex mathematical formulations. → SVM is proper when the relation input/output is unknown or very costly to be obtained. → Results indicate the potential of SVM for reliability time series prediction. → Reliability estimates support the establishment of adequate maintenance strategies.
Koeneman, Margot M; van Lint, Freyja H M; van Kuijk, Sander M J; Smits, Luc J M; Kooreman, Loes F S; Kruitwagen, Roy F P M; Kruse, Arnold J
2017-01-01
This study aims to develop a prediction model for spontaneous regression of cervical intraepithelial neoplasia grade 2 (CIN 2) lesions based on simple clinicopathological parameters. The study was conducted at Maastricht University Medical Center, the Netherlands. The prediction model was developed in a retrospective cohort of 129 women with a histologic diagnosis of CIN 2 who were managed by watchful waiting for 6 to 24months. Five potential predictors for spontaneous regression were selected based on the literature and expert opinion and were analyzed in a multivariable logistic regression model, followed by backward stepwise deletion based on the Wald test. The prediction model was internally validated by the bootstrapping method. Discriminative capacity and accuracy were tested by assessing the area under the receiver operating characteristic curve (AUC) and a calibration plot. Disease regression within 24months was seen in 91 (71%) of 129 patients. A prediction model was developed including the following variables: smoking, Papanicolaou test outcome before the CIN 2 diagnosis, concomitant CIN 1 diagnosis in the same biopsy, and more than 1 biopsy containing CIN 2. Not smoking, Papanicolaou class predictive of disease regression. The AUC was 69.2% (95% confidence interval, 58.5%-79.9%), indicating a moderate discriminative ability of the model. The calibration plot indicated good calibration of the predicted probabilities. This prediction model for spontaneous regression of CIN 2 may aid physicians in the personalized management of these lesions. Copyright © 2016 Elsevier Inc. All rights reserved.
DEFF Research Database (Denmark)
Sharif, Behzad; Makowski, David; Plauborg, Finn
2017-01-01
Statistical regression models represent alternatives to process-based dynamic models for predicting the response of crop yields to variation in climatic conditions. Regression models can be used to quantify the effect of change in temperature and precipitation on yields. However, it is difficult ...
de Luis, D A; Aller, R; Izaola, O; Romero, E
2006-01-01
The aim of our study was to evaluate the accuracy of the equations to estimate REE in obese patents and develop a new equation in our obese population. A population of 200 obesity outpatients was analyzed in a prospective way. The following variables were specifically recorded: age, weight, body mass index (BMI), waist circumference, and waist-to-hip ratio. Basal glucose, insulin, and TSH (thyroid-stimulating hormone) were measured. An indirect calorimetry and a tetrapolar electrical bioimpedance were performed. REE measured by indirect calorimetry was compared with REE obtained by prediction equations to obese or nonobese patients. The mean age was 44.8 +/- 16.81 years and the mean BMI 34.4 +/- 5.3. Indirect calorimetry showed that, as compared to women, men had higher resting energy expenditure (REE) (1,998.1 +/- 432 vs. 1,663.9 +/- 349 kcal/day; p consumption (284.6 +/- 67.7 vs. 238.6 +/- 54.3 ml/min; p predicted by prediction equations showed the next data; Berstein's equation (r = 0.65; p prediction equation was REE = 58.6 + (6.1 x weight (kg)) + (1,023.7 x height (m)) - (9.5 x age). The female model was REE = 1,272.5 + (9.8 x weight (kg)) - (61.6 x height (m)) - (8.2 x age). Our prediction equations showed a nonsignificant difference with REE measured (-3.7 kcal/day) with a significant correlation coefficient (r = 0.67; p prediction equations overestimated and underestimated REE measured. WHO equation developed in normal weight individuals provided the closest values. The two new equations (male and female equations) developed in our study had a good accuracy. Copyright 2006 S. Karger AG, Basel.
Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru
2017-09-01
Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.
2018-01-01
Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.
Directory of Open Access Journals (Sweden)
Shahrbanoo Goli
2016-01-01
Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.
Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models
Directory of Open Access Journals (Sweden)
Aeriel Belk
2018-02-01
Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.
Energy Technology Data Exchange (ETDEWEB)
Brown, Joshua D.; Summers, Michael F. [University of Maryland Baltimore County, Howard Hughes Medical Institute (United States); Johnson, Bruce A., E-mail: bruce.johnson@asrc.cuny.edu [University of Maryland Baltimore County, Department of Chemistry and Biochemistry (United States)
2015-09-15
The Biological Magnetic Resonance Data Bank (BMRB) contains NMR chemical shift depositions for over 200 RNAs and RNA-containing complexes. We have analyzed the {sup 1}H NMR and {sup 13}C chemical shifts reported for non-exchangeable protons of 187 of these RNAs. Software was developed that downloads BMRB datasets and corresponding PDB structure files, and then generates residue-specific attributes based on the calculated secondary structure. Attributes represent properties present in each sequential stretch of five adjacent residues and include variables such as nucleotide type, base-pair presence and type, and tetraloop types. Attributes and {sup 1}H and {sup 13}C NMR chemical shifts of the central nucleotide are then used as input to train a predictive model using support vector regression. These models can then be used to predict shifts for new sequences. The new software tools, available as stand-alone scripts or integrated into the NMR visualization and analysis program NMRViewJ, should facilitate NMR assignment and/or validation of RNA {sup 1}H and {sup 13}C chemical shifts. In addition, our findings enabled the re-calibration a ring-current shift model using published NMR chemical shifts and high-resolution X-ray structural data as guides.
Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model
Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd
2016-10-01
Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.
International Nuclear Information System (INIS)
Jiang, B.T.; Zhao, F.Y.
2013-01-01
Highlights: ► CHF data are collected from the published literature. ► Less training data are used to train the LSSVR model. ► PSO is adopted to optimize the key parameters to improve the model precision. ► The reliability of LSSVR is proved through parametric trends analysis. - Abstract: In view of practical importance of critical heat flux (CHF) for design and safety of nuclear reactors, accurate prediction of CHF is of utmost significance. This paper presents a novel approach using least squares support vector regression (LSSVR) and particle swarm optimization (PSO) to predict CHF. Two available published datasets are used to train and test the proposed algorithm, in which PSO is employed to search for the best parameters involved in LSSVR model. The CHF values obtained by the LSSVR model are compared with the corresponding experimental values and those of a previous method, adaptive neuro fuzzy inference system (ANFIS). This comparison is also carried out in the investigation of parametric trends of CHF. It is found that the proposed method can achieve the desired performance and yields a more satisfactory fit with experimental results than ANFIS. Therefore, LSSVR method is likely to be suitable for other parameters processing such as CHF
Kanbayashi, Yuko; Ishikawa, Takeshi; Kanazawa, Motohiro; Nakajima, Yuki; Kawano, Rumi; Tabuchi, Yusuke; Yoshioka, Tomoko; Ihara, Norihiko; Hosokawa, Toyoshi; Takayama, Koichi; Shikata, Keisuke; Taguchi, Tetsuya
2018-03-16
Although pegfilgrastim prophylaxis is expected to maintain the relative dose intensity (RDI) of chemotherapy and improve safety, information is limited. However, the optimal selection of patients eligible for pegfilgrastim prophylaxis is an important issue from a medical economics viewpoint. Therefore, this retrospective study identified factors that could predict these eligible patients to maintain the RDI. The participants included 166 cancer patients undergoing pegfilgrastim prophylaxis combined with chemotherapy in our outpatient chemotherapy center between March 2015 and April 2017. Variables were extracted from clinical records for regression analysis of factors related to maintenance of the RDI. RDI was classified into four categories: 100% = 0, 85% or predictive factors in patients eligible for pegfilgrastim prophylaxis to maintain the RDI. Threshold measures were examined using a receiver operating characteristic (ROC) analysis curve. Age [odds ratio (OR) 1.07, 95% confidence interval (CI) 1.04-1.11; P maintenance. ROC curve analysis of the group that failed to maintain the RDI indicated that the threshold for age was 70 years and above, with a sensitivity of 60.0% and specificity of 80.2% (area under the curve: 0.74). In conclusion, younger age, anemia (less), and administration of pegfilgrastim 24-72 h after chemotherapy were significant factors for RDI maintenance.
Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas
2015-04-01
The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine
Nowicki, M. A.; Hearne, M.; Thompson, E.; Wald, D. J.
2012-12-01
Seismically induced landslides present a costly and often fatal threats in many mountainous regions. Substantial effort has been invested to understand where seismically induced landslides may occur in the future. Both slope-stability methods and, more recently, statistical approaches to the problem are described throughout the literature. Though some regional efforts have succeeded, no uniformly agreed-upon method is available for predicting the likelihood and spatial extent of seismically induced landslides. For use in the U. S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system, we would like to routinely make such estimates, in near-real time, around the globe. Here we use the recently produced USGS ShakeMap Atlas of historic earthquakes to develop an empirical landslide probability model. We focus on recent events, yet include any digitally-mapped landslide inventories for which well-constrained ShakeMaps are also available. We combine these uniform estimates of the input shaking (e.g., peak acceleration and velocity) with broadly available susceptibility proxies, such as topographic slope and surface geology. The resulting database is used to build a predictive model of the probability of landslide occurrence with logistic regression. The landslide database includes observations from the Northridge, California (1994); Wenchuan, China (2008); ChiChi, Taiwan (1999); and Chuetsu, Japan (2004) earthquakes; we also provide ShakeMaps for moderate-sized events without landslide for proper model testing and training. The performance of the regression model is assessed with both statistical goodness-of-fit metrics and a qualitative review of whether or not the model is able to capture the spatial extent of landslides for each event. Part of our goal is to determine which variables can be employed based on globally-available data or proxies, and whether or not modeling results from one region are transferrable to
Zounemat-Kermani, Mohammad
2012-08-01
In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
Wang, J; Wang, F; Liu, Y; Xu, J; Lin, H; Jia, B; Zuo, W; Jiang, Y; Hu, L; Lin, F
2016-01-01
Overweight individuals are at higher risk for developing type II diabetes than the general population. We conducted this study to analyze the correlation between blood glucose and biochemical parameters, and developed a blood glucose prediction model tailored to overweight patients. A total of 346 overweight Chinese people patients ages 18-81 years were involved in this study. Their levels of fasting glucose (fs-GLU), blood lipids, and hepatic and renal functions were measured and analyzed by multiple linear regression (MLR). Based the MLR results, we developed a back propagation artificial neural network (BP-ANN) model by selecting tansig as the transfer function of the hidden layers nodes, and purelin for the output layer nodes, with training goal of 0.5×10(-5). There was significant correlation between fs-GLU with age, BMI, and blood biochemical indexes (P<0.05). The results of MLR analysis indicated that age, fasting alanine transaminase (fs-ALT), blood urea nitrogen (fs-BUN), total protein (fs-TP), uric acid (fs-BUN), and BMI are 6 independent variables related to fs-GLU. Based on these parameters, the BP-ANN model was performed well and reached high prediction accuracy when training 1 000 epoch (R=0.9987). The level of fs-GLU was predictable using the proposed BP-ANN model based on 6 related parameters (age, fs-ALT, fs-BUN, fs-TP, fs-UA and BMI) in overweight patients. © Georg Thieme Verlag KG Stuttgart · New York.
Al-Mudhafar, W. J.
2013-12-01
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly
Predictive model for early math skills based on structural equations.
Aragón, Estíbaliz; Navarro, José I; Aguilar, Manuel; Cerda, Gamal; García-Sedeño, Manuel
2016-12-01
Early math skills are determined by higher cognitive processes that are particularly important for acquiring and developing skills during a child's early education. Such processes could be a critical target for identifying students at risk for math learning difficulties. Few studies have considered the use of a structural equation method to rationalize these relations. Participating in this study were 207 preschool students ages 59 to 72 months, 108 boys and 99 girls. Performance with respect to early math skills, early literacy, general intelligence, working memory, and short-term memory was assessed. A structural equation model explaining 64.3% of the variance in early math skills was applied. Early literacy exhibited the highest statistical significance (β = 0.443, p < 0.05), followed by intelligence (β = 0.286, p < 0.05), working memory (β = 0.220, p < 0.05), and short-term memory (β = 0.213, p < 0.05). Correlations between the independent variables were also significant (p < 0.05). According to the results, cognitive variables should be included in remedial intervention programs. © 2016 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
A prediction equation for enteric methane emission from dairy cows for use in NorFor
DEFF Research Database (Denmark)
Nielsen, N I; Volden, H; Åkerlind, M
2013-01-01
A data-set with 47 treatment means (N = 211) was compiled from research institutions in Denmark, Norway, and Sweden in order to develop a prediction equation for enteric methane (CH4) emissions from dairy cows. The aim was to implement the equation in the Nordic feed evaluation system NorFor. The...
Trends of Abutment-Scour Prediction Equations Applied to 144 Field Sites in South Carolina
Benedict, Stephen T.; Deshpande, Nikhil; Aziz, Nadim M.; Conrads, Paul
2006-01-01
The U.S. Geological Survey conducted a study in cooperation with the Federal Highway Administration in which predicted abutment-scour depths computed with selected predictive equations were compared with field measurements of abutment-scour depth made at 144 bridges in South Carolina. The assessment used five equations published in the Fourth Edition of 'Evaluating Scour at Bridges,' (Hydraulic Engineering Circular 18), including the original Froehlich, the modified Froehlich, the Sturm, the Maryland, and the HIRE equations. An additional unpublished equation also was assessed. Comparisons between predicted and observed scour depths are intended to illustrate general trends and order-of-magnitude differences for the prediction equations. Field measurements were taken during non-flood conditions when the hydraulic conditions that caused the scour generally are unknown. The predicted scour depths are based on hydraulic conditions associated with the 100-year flow at all sites and the flood of record for 35 sites. Comparisons showed that predicted scour depths frequently overpredict observed scour and at times were excessive. The comparison also showed that underprediction occurred, but with less frequency. The performance of these equations indicates that they are poor predictors of abutment-scour depth in South Carolina, and it is probable that poor performance will occur when the equations are applied in other geographic regions. Extensive data and graphs used to compare predicted and observed scour depths in this study were compiled into spreadsheets and are included in digital format with this report. In addition to the equation-comparison data, Water-Surface Profile Model tube-velocity data, soil-boring data, and selected abutment-scour data are included in digital format with this report. The digital database was developed as a resource for future researchers and is especially valuable for evaluating the reasonableness of future equations that may be developed.
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Predictive Model Equations for Palm Kernel (Elaeis guneensis J ...
African Journals Online (AJOL)
Estimated error of ± 0.18 and ± 0.2 are envisaged while applying the models for predicting palm kernel and sesame oil colours respectively. Keywords: Palm kernel, Sesame, Palm kernel, Oil Colour, Process Parameters, Model. Journal of Applied Science, Engineering and Technology Vol. 6 (1) 2006 pp. 34-38 ...
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii
Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie
1988-01-01
Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
Saturated properties prediction in critical region by a quartic equation of state
Directory of Open Access Journals (Sweden)
Yong Wang
2011-08-01
Full Text Available A diverse substance library containing extensive PVT data for 77 pure components was used to critically evaluate the performance of a quartic equation of state and other four famous cubic equations of state in critical region. The quartic EOS studied in this work was found to significantly superior to the others in both vapor pressure prediction and saturated volume prediction in vicinity of critical point.
Aznar, Margarita; López, Ricardo; Cacho, Juan; Ferreira, Vicente
2003-04-23
Partial least squares regression (PLSR) models able to predict some of the wine aroma nuances from its chemical composition have been developed. The aromatic sensory characteristics of 57 Spanish aged red wines were determined by 51 experts from the wine industry. The individual descriptions given by the experts were recorded, and the frequency with which a sensory term was used to define a given wine was taken as a measurement of its intensity. The aromatic chemical composition of the wines was determined by already published gas chromatography (GC)-flame ionization detector and GC-mass spectrometry methods. In the whole, 69 odorants were analyzed. Both matrixes, the sensory and chemical data, were simplified by grouping and rearranging correlated sensory terms or chemical compounds and by the exclusion of secondary aroma terms or of weak aroma chemicals. Finally, models were developed for 18 sensory terms and 27 chemicals or groups of chemicals. Satisfactory models, explaining more than 45% of the original variance, could be found for nine of the most important sensory terms (wood-vanillin-cinnamon, animal-leather-phenolic, toasted-coffee, old wood-reduction, vegetal-pepper, raisin-flowery, sweet-candy-cacao, fruity, and berry fruit). For this set of terms, the correlation coefficients between the measured and predicted Y (determined by cross-validation) ranged from 0.62 to 0.81. Models confirmed the existence of complex multivariate relationships between chemicals and odors. In general, pleasant descriptors were positively correlated to chemicals with pleasant aroma, such as vanillin, beta damascenone, or (E)-beta-methyl-gamma-octalactone, and negatively correlated to compounds showing less favorable odor properties, such as 4-ethyl and vinyl phenols, 3-(methylthio)-1-propanol, or phenylacetaldehyde.
Directory of Open Access Journals (Sweden)
Meiping Wang
2016-01-01
Full Text Available We developed an effective intelligent model to predict the dynamic heat supply of heat source. A hybrid forecasting method was proposed based on support vector regression (SVR model-optimized particle swarm optimization (PSO algorithms. Due to the interaction of meteorological conditions and the heating parameters of heating system, it is extremely difficult to forecast dynamic heat supply. Firstly, the correlations among heat supply and related influencing factors in the heating system were analyzed through the correlation analysis of statistical theory. Then, the SVR model was employed to forecast dynamic heat supply. In the model, the input variables were selected based on the correlation analysis and three crucial parameters, including the penalties factor, gamma of the kernel RBF, and insensitive loss function, were optimized by PSO algorithms. The optimized SVR model was compared with the basic SVR, optimized genetic algorithm-SVR (GA-SVR, and artificial neural network (ANN through six groups of experiment data from two heat sources. The results of the correlation coefficient analysis revealed the relationship between the influencing factors and the forecasted heat supply and determined the input variables. The performance of the PSO-SVR model is superior to those of the other three models. The PSO-SVR method is statistically robust and can be applied to practical heating system.
Directory of Open Access Journals (Sweden)
Drzewiecki Wojciech
2016-12-01
Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.
Kim, T. W.; Park, G. H.
2014-12-01
Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal
Directory of Open Access Journals (Sweden)
Chi-Cheng Huang
2013-01-01
Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Prediction Equations Overestimate the Energy Requirements More for Obesity-Susceptible Individuals.
McLay-Cooke, Rebecca T; Gray, Andrew R; Jones, Lynnette M; Taylor, Rachael W; Skidmore, Paula M L; Brown, Rachel C
2017-09-13
Predictive equations to estimate resting metabolic rate (RMR) are often used in dietary counseling and by online apps to set energy intake goals for weight loss. It is critical to know whether such equations are appropriate for those susceptible to obesity. We measured RMR by indirect calorimetry after an overnight fast in 26 obesity susceptible (OSI) and 30 obesity resistant (ORI) individuals, identified using a simple 6-item screening tool. Predicted RMR was calculated using the FAO/WHO/UNU (Food and Agricultural Organisation/World Health Organisation/United Nations University), Oxford and Miflin-St Jeor equations. Absolute measured RMR did not differ significantly between OSI versus ORI (6339 vs. 5893 kJ·d -1 , p = 0.313). All three prediction equations over-estimated RMR for both OSI and ORI when measured RMR was ≤5000 kJ·d -1 . For measured RMR ≤7000 kJ·d -1 there was statistically significant evidence that the equations overestimate RMR to a greater extent for those classified as obesity susceptible with biases ranging between around 10% to nearly 30% depending on the equation. The use of prediction equations may overestimate RMR and energy requirements particularly in those who self-identify as being susceptible to obesity, which has implications for effective weight management.
International Nuclear Information System (INIS)
Lee, Jae-Kon; Kim, Jin-Gon
2011-01-01
A governing differential equation for predicting the effective thermal conductivity of composites with spherical inclusions is shown to be simply derived by using the result of the generalized self-consistent model. By applying the equation to composites including spherical inclusions such as graded spherical inclusions, microballoons, mutiply-coated spheres, and spherical inclusions with an interphase, their effective thermal conductivities are easily predicted. The results are compared with those in the literatures to be consistent. It can be stated from the investigations that the effective thermal conductivity of composites with spherical inclusions can be estimated as long as their conductivities are expressed as a function of their radius. -- Highlights: → We derive equation for predicting the effective thermal conductivity of composites. → The equation is derived using the results of the generalized self-consistent model. → The inclusions are graded sphere, microballoons, and mutiply-coated spheres.
Boomer, Kathleen B; Weller, Donald E; Jordan, Thomas E
2008-01-01
The Universal Soil Loss Equation (USLE) and its derivatives are widely used for identifying watersheds with a high potential for degrading stream water quality. We compared sediment yields estimated from regional application of the USLE, the automated revised RUSLE2, and five sediment delivery ratio algorithms to measured annual average sediment delivery in 78 catchments of the Chesapeake Bay watershed. We did the same comparisons for another 23 catchments monitored by the USGS. Predictions exceeded observed sediment yields by more than 100% and were highly correlated with USLE erosion predictions (Pearson r range, 0.73-0.92; p USLE estimates (r = 0.87; p USLE model did not change the results. In ranked comparisons between observed and predicted sediment yields, the models failed to identify catchments with higher yields (r range, -0.28-0.00; p > 0.14). In a multiple regression analysis, soil erodibility, log (stream flow), basin shape (topographic relief ratio), the square-root transformed proportion of forest, and occurrence in the Appalachian Plateau province explained 55% of the observed variance in measured suspended sediment loads, but the model performed poorly (r(2) = 0.06) at predicting loads in the 23 USGS watersheds not used in fitting the model. The use of USLE or multiple regression models to predict sediment yields is not advisable despite their present widespread application. Integrated watershed models based on the USLE may also be unsuitable for making management decisions.
Predicted equations for ventilatory function among Kuching (Sarawak, Malaysia) population.
Djojodibroto, R D; Pratibha, G; Kamaluddin, B; Manjit, S S; Sumitabha, G; Kumar, A Deva; Hashami, B
2009-12-01
Spirometry data of 869 individuals (males and females) between the ages of 10 to 60 years were analyzed. The analysis yielded the following conclusions: 1. The pattern of Forced Vital Capacity (FVC) and Forced Expiratory Volume in One Second (FEV1) for the selected subgroups seems to be gender dependant: in males, the highest values were seen in the Chinese, followed by the Malay, and then the Dayak; in females, the highest values were seen in the Chinese, followed by the Dayak, and then the Malay. 2. Smoking that did not produce respiratory symptom was not associated with a decline in lung function, in fact we noted higher values in smokers as compared to nonsmokers. 3. Prediction formulae (54 in total) are worked out for FVC & FEV1 for the respective gender and each of the selected subgroups.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Directory of Open Access Journals (Sweden)
Vasileios A. Tzanakakis
2014-12-01
Full Text Available Partial Least Squares Regression (PLSR can integrate a great number of variables and overcome collinearity problems, a fact that makes it suitable for intensive agronomical practices such as land application. In the present study a PLSR model was developed to predict important management goals, including biomass production and nutrient recovery (i.e., nitrogen and phosphorus, associated with treatment potential, environmental impacts, and economic benefits. Effluent loading and a considerable number of soil parameters commonly monitored in effluent irrigated lands were considered as potential predictor variables during the model development. All data were derived from a three year field trial including plantations of four different plant species (Acacia cyanophylla, Eucalyptus camaldulensis, Populus nigra, and Arundo donax, irrigated with pre-treated domestic effluent. PLSR method was very effective despite the small sample size and the wide nature of data set (with many highly correlated inputs and several highly correlated responses. Through PLSR method the number of initial predictor variables was reduced and only several variables were remained and included in the final PLSR model. The important input variables maintained were: Effluent loading, electrical conductivity (EC, available phosphorus (Olsen-P, Na+, Ca2+, Mg2+, K2+, SAR, and NO3−-N. Among these variables, effluent loading, EC, and nitrates had the greater contribution to the final PLSR model. PLSR is highly compatible with intensive agronomical practices such as land application, in which a large number of highly collinear and noisy input variables is monitored to assess plant species performance and to detect impacts on the environment.
Predicting Eight Grade Students' Equation Solving Performances via Concepts of Variable and Equality
Ertekin, Erhan
2017-01-01
This study focused on how two algebraic concepts- equality and variable- predicted 8th grade students' equation solving performance. In this study, predictive design as a correlational research design was used. Randomly selected 407 eight-grade students who were from the central districts of a city in the central region of Turkey participated in…
Directory of Open Access Journals (Sweden)
Jun Bi
2018-04-01
Full Text Available Battery electric vehicles (BEVs reduce energy consumption and air pollution as compared with conventional vehicles. However, the limited driving range and potential long charging time of BEVs create new problems. Accurate charging time prediction of BEVs helps drivers determine travel plans and alleviate their range anxiety during trips. This study proposed a combined model for charging time prediction based on regression and time-series methods according to the actual data from BEVs operating in Beijing, China. After data analysis, a regression model was established by considering the charged amount for charging time prediction. Furthermore, a time-series method was adopted to calibrate the regression model, which significantly improved the fitting accuracy of the model. The parameters of the model were determined by using the actual data. Verification results confirmed the accuracy of the model and showed that the model errors were small. The proposed model can accurately depict the charging time characteristics of BEVs in Beijing.
International Nuclear Information System (INIS)
Alvarez Huerta, A.; Gonzalez Miguelez, R.; Garcia Metola, D.; Noriega Gonzalez, A.
2011-01-01
The modelization is carried out through two different techniques, a conventional polynomial regression and other based on an approach by neural networks artificial. He is a comparison between the quality of the forecast would make different models based on the polynomial regression and neural network with generalization by Bayesian regulation, using the indicators of the root of the mean square error and the coefficient of determination, in view of the results, the neural network generates a prediction more accurate and reliable than the polynomial regression.
DEFF Research Database (Denmark)
Puig Arnavat, Maria; López-Villada, Jesús; Bruno, Joan Carles
2010-01-01
Two approaches to the characteristic equation method have been compared in order to find a simple model that best describes the performance of thermal chillers. After comparing the results obtained using experimental data from a single-effect absorption chiller, we concluded that the adaptation o...... chillers. The characteristic parameters for these chillers are given and can be incorporated as a chiller module in thermal modelling and simulation packages....
Zafarani, H.; Luzi, Lucia; Lanzano, Giovanni; Soghrat, M. R.
2018-01-01
A recently compiled, comprehensive, and good-quality strong-motion database of the Iranian earthquakes has been used to develop local empirical equations for the prediction of peak ground acceleration (PGA) and 5%-damped pseudo-spectral accelerations (PSA) up to 4.0 s. The equations account for style of faulting and four site classes and use the horizontal distance from the surface projection of the rupture plane as a distance measure. The model predicts the geometric mean of horizontal components and the vertical-to-horizontal ratio. A total of 1551 free-field acceleration time histories recorded at distances of up to 200 km from 200 shallow earthquakes (depth < 30 km) with moment magnitudes ranging from Mw 4.0 to 7.3 are used to perform regression analysis using the random effects algorithm of Abrahamson and Youngs (Bull Seism Soc Am 82:505-510, 1992), which considers between-events as well as within-events errors. Due to the limited data used in the development of previous Iranian ground motion prediction equations (GMPEs) and strong trade-offs between different terms of GMPEs, it is likely that the previously determined models might have less precision on their coefficients in comparison to the current study. The richer database of the current study allows improving on prior works by considering additional variables that could not previously be adequately constrained. Here, a functional form used by Boore and Atkinson (Earthquake Spect 24:99-138, 2008) and Bindi et al. (Bull Seism Soc Am 9:1899-1920, 2011) has been adopted that allows accounting for the saturation of ground motions at close distances. A regression has been also performed for the V/H in order to retrieve vertical components by scaling horizontal spectra. In order to take into account epistemic uncertainty, the new model can be used along with other appropriate GMPEs through a logic tree framework for seismic hazard assessment in Iran and Middle East region.
Watson, Kara M.; McHugh, Amy R.
2014-01-01
Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is
Directory of Open Access Journals (Sweden)
J. Chardon
2018-01-01
Full Text Available Statistical downscaling models (SDMs are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression
Ling, X.; Snow, J. E.; Chin, W.
2017-12-01
The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The
Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant
International Nuclear Information System (INIS)
Chan, Yea-Kuang; Tsai, Yu-Ching
2017-01-01
The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.
Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant
Energy Technology Data Exchange (ETDEWEB)
Chan, Yea-Kuang; Tsai, Yu-Ching [Institute of Nuclear Energy Research, Taoyuan City, Taiwan (China). Nuclear Engineering Division
2017-03-15
The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.
Validity of a population-specific BMR predictive equation for adults from an urban tropical setting.
Wahrlich, Vivian; Teixeira, Tatiana Miliante; Anjos, Luiz Antonio Dos
2018-02-01
Basal metabolic rate (BMR) is an important physiologic measure in nutrition research. In many instances it is not measured but estimated by predictive equations. The purpose of this study was to compare measured BMR (BMRm) with estimated BMR (BMRe) obtained by different equations. A convenient sample of 148 (89 women) 20-60 year-old subjects from the metropolitan area of Rio de Janeiro, Brazil participated in the study. BMRm values were measured by an indirect calorimeter and predicted by different equations (Schofield, Henry and Rees, Mifflin-St. Jeor and Anjos. All subjects had their body composition and anthropometric variables also measured. Accuracy of the estimations was established by the percentage of BMRe falling within ±10% of BMRm and bias when the 95% CI of the difference of BMRe and BMRm means did not include zero. Mean BMRm values were 4833.5 (SD 583.3) and 6278.8 (SD 724.0) kJ*day -1 for women and men, respectively. BMRe values were both biased and inaccurate except for values predicted by the Anjos equation. BMR overestimation was approximately 20% for the Schofield equation which was higher comparatively to the Henry and Rees (14.5% and 9.6% for women and men, respectively) and the Mifflin-St. Jeor (approximately 14.0%) equations. BMR estimated by the Anjos equation was unbiased (95% CI = -78.1; 96.3 kJ day -1 for women and -282.6; 30.7 kJ*day -1 for men). Population-specific BMR predictive equations yield unbiased and accurate BMR values in adults from an urban tropical setting. Copyright © 2016 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Directory of Open Access Journals (Sweden)
Peek Andrew S
2007-06-01
Full Text Available Abstract Background RNA interference (RNAi is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM approach was used to quantitatively model RNA interference activities. Results Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. Conclusion The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid
Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.
2017-01-01
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Verreijen, A. M.; Garrido, V.; Engberink, M.F.; Memelink, R. G.; Visser, M.; Weijs, P. J.
2017-01-01
Rationale: Predictive equations for resting energy expenditure (REE) are used in the treatment of overweight and obesity, but the validity of these equations in overweight older adults is unknown. This study evaluates which predictive REE equation is the best alternative to indirect calorimetry in
Wright, Glenn A; Pustina, Andrew A; Mikat, Richard P; Kernozek, Thomas W
2012-03-01
The purpose of this study was to determine the efficacy of estimating peak lower body power from a maximal jump squat using 3 different vertical jump prediction equations. Sixty physically active college students (30 men, 30 women) performed jump squats with a weighted bar's applied load of 20, 40, and 60% of body mass across the shoulders. Each jump squat was simultaneously monitored using a force plate and a contact mat. Peak power (PP) was calculated using vertical ground reaction force from the force plate data. Commonly used equations requiring body mass and vertical jump height to estimate PP were applied such that the system mass (mass of body + applied load) was substituted for body mass. Jump height was determined from flight time as measured with a contact mat during a maximal jump squat. Estimations of PP (PP(est)) for each load and for each prediction equation were compared with criterion PP values from a force plate (PP(FP)). The PP(est) values had high test-retest reliability and were strongly correlated to PP(FP) in both men and women at all relative loads. However, only the Harman equation accurately predicted PP(FP) at all relative loads. It can therefore be concluded that the Harman equation may be used to estimate PP of a loaded jump squat knowing the system mass and peak jump height when more precise (and expensive) measurement equipment is unavailable. Further, high reliability and correlation with criterion values suggest that serial assessment of power production across training periods could be used for relative assessment of change by either of the prediction equations used in this study.
Augmented chaos-multiple linear regression approach for prediction of wave parameters
Directory of Open Access Journals (Sweden)
M.A. Ghorbani
2017-06-01
The inter-comparisons demonstrated that the Chaos-MLR and pure MLR models yield almost the same accuracy in predicting the significant wave heights and the zero-up-crossing wave periods. Whereas, the augmented Chaos-MLR model is performed better results in term of the prediction accuracy vis-a-vis the previous prediction applications of the same case study.
Hofsteenge, Geesje H; Chinapaw, Mai J M; Weijs, Peter J M
2015-10-15
In clinical practice, patient friendly methods to assess body composition in obese adolescents are needed. Therefore, the bioelectrical impedance analysis (BIA) related fat-free mass (FFM) prediction equations (FFM-BIA) were evaluated in obese adolescents (age 11-18 years) compared to FFM measured by dual-energy x-ray absorptiometry (FFM-DXA) and a new population specific FFM-BIA equation is developed. After an overnight fast, the subjects attended the outpatient clinic. After measuring height and weight, a full body scan by dual-energy x-ray absorptiometry (DXA) and a BIA measurement was performed. Thirteen predictive FFM-BIA equations based on weight, height, age, resistance, reactance and/or impedance were systematically selected and compared to FFM-DXA. Accuracy of FFM-BIA equations was evaluated by the percentage adolescents predicted within 5% of FFM-DXA measured, the mean percentage difference between predicted and measured values (bias) and the Root Mean Squared prediction Error (RMSE). Multiple linear regression was conducted to develop a new BIA equation. Validation was based on 103 adolescents (60% girls), age 14.5 (sd1.7) years, weight 94.1 (sd15.6) kg and FFM-DXA of 56.1 (sd9.8) kg. The percentage accurate estimations varied between equations from 0 to 68%; bias ranged from -29.3 to +36.3% and RMSE ranged from 2.8 to 12.4 kg. An alternative prediction equation was developed: FFM = 0.527 * H(cm)(2)/Imp + 0.306 * weight - 1.862 (R(2) = 0.92, SEE = 2.85 kg). Percentage accurate prediction was 76%. Compared to DXA, the Gray equation underestimated the FFM with 0.4 kg (55.7 ± 8.3), had an RMSE of 3.2 kg, 63% accurate prediction and the smallest bias of (-0.1%). When split by sex, the Gray equation had the narrowest range in accurate predictions, bias, and RMSE. For the assessment of FFM with BIA, the Gray-FFM equation appears to be the most accurate, but 63% is still not at an acceptable accuracy level for obese adolescents. The new equation appears to
Directory of Open Access Journals (Sweden)
Kehinde Anthony Mogaji
2016-07-01
Full Text Available This study developed a GIS-based multivariate regression (MVR yield rate prediction model of groundwater resource sustainability in the hard-rock geology terrain of southwestern Nigeria. This model can economically manage the aquifer yield rate potential predictions that are often overlooked in groundwater resources development. The proposed model relates the borehole yield rate inventory of the area to geoelectrically derived parameters. Three sets of borehole yield rate conditioning geoelectrically derived parameters—aquifer unit resistivity (ρ, aquifer unit thickness (D and coefficient of anisotropy (λ—were determined from the acquired and interpreted geophysical data. The extracted borehole yield rate values and the geoelectrically derived parameter values were regressed to develop the MVR relationship model by applying linear regression and GIS techniques. The sensitivity analysis results of the MVR model evaluated at P ⩽ 0.05 for the predictors ρ, D and λ provided values of 2.68 × 10−05, 2 × 10−02 and 2.09 × 10−06, respectively. The accuracy and predictive power tests conducted on the MVR model using the Theil inequality coefficient measurement approach, coupled with the sensitivity analysis results, confirmed the model yield rate estimation and prediction capability. The MVR borehole yield prediction model estimates were processed in a GIS environment to model an aquifer yield potential prediction map of the area. The information on the prediction map can serve as a scientific basis for predicting aquifer yield potential rates relevant in groundwater resources sustainability management. The developed MVR borehole yield rate prediction mode provides a good alternative to other methods used for this purpose.
International Nuclear Information System (INIS)
Yu, Jie; Chen, Kuilin; Mori, Junichi; Rashid, Mudassir M.
2013-01-01
Optimizing wind power generation and controlling the operation of wind turbines to efficiently harness the renewable wind energy is a challenging task due to the intermittency and unpredictable nature of wind speed, which has significant influence on wind power production. A new approach for long-term wind speed forecasting is developed in this study by integrating GMCM (Gaussian mixture copula model) and localized GPR (Gaussian process regression). The time series of wind speed is first classified into multiple non-Gaussian components through the Gaussian mixture copula model and then Bayesian inference strategy is employed to incorporate the various non-Gaussian components using the posterior probabilities. Further, the localized Gaussian process regression models corresponding to different non-Gaussian components are built to characterize the stochastic uncertainty and non-stationary seasonality of the wind speed data. The various localized GPR models are integrated through the posterior probabilities as the weightings so that a global predictive model is developed for the prediction of wind speed. The proposed GMCM–GPR approach is demonstrated using wind speed data from various wind farm locations and compared against the GMCM-based ARIMA (auto-regressive integrated moving average) and SVR (support vector regression) methods. In contrast to GMCM–ARIMA and GMCM–SVR methods, the proposed GMCM–GPR model is able to well characterize the multi-seasonality and uncertainty of wind speed series for accurate long-term prediction. - Highlights: • A novel predictive modeling method is proposed for long-term wind speed forecasting. • Gaussian mixture copula model is estimated to characterize the multi-seasonality. • Localized Gaussian process regression models can deal with the random uncertainty. • Multiple GPR models are integrated through Bayesian inference strategy. • The proposed approach shows higher prediction accuracy and reliability
Dang, Mia; Ramsaran, Kalinda D; Street, Melissa E; Syed, S Noreen; Barclay-Goddard, Ruth; Stratford, Paul W; Miller, Patricia A
2011-01-01
To estimate the predictive accuracy and clinical usefulness of the Chedoke-McMaster Stroke Assessment (CMSA) predictive equations. A longitudinal prognostic study using historical data obtained from 104 patients admitted post cerebrovascular accident was undertaken. Data were abstracted for all patients undergoing rehabilitation post stroke who also had documented admission and discharge CMSA scores. Published predictive equations were used to determine predicted outcomes. To determine the accuracy and clinical usefulness of the predictive model, shrinkage coefficients and predictions with 95% confidence bands were calculated. Complete data were available for 74 patients with a mean age of 65.3±12.4 years. The shrinkage values for the six Impairment Inventory (II) dimensions varied from -0.05 to 0.09; the shrinkage value for the Activity Inventory (AI) was 0.21. The error associated with predictive values was greater than ±1.5 stages for the II dimensions and greater than ±24 points for the AI. This study shows that the large error associated with the predictions (as defined by the confidence band) for the CMSA II and AI limits their clinical usefulness as a predictive measure. Further research to establish predictive models using alternative statistical procedures is warranted.
Dang, Mia; Ramsaran, Kalinda D.; Street, Melissa E.; Syed, S. Noreen; Barclay-Goddard, Ruth; Miller, Patricia A.
2011-01-01
ABSTRACT Purpose: To estimate the predictive accuracy and clinical usefulness of the Chedoke–McMaster Stroke Assessment (CMSA) predictive equations. Method: A longitudinal prognostic study using historical data obtained from 104 patients admitted post cerebrovascular accident was undertaken. Data were abstracted for all patients undergoing rehabilitation post stroke who also had documented admission and discharge CMSA scores. Published predictive equations were used to determine predicted outcomes. To determine the accuracy and clinical usefulness of the predictive model, shrinkage coefficients and predictions with 95% confidence bands were calculated. Results: Complete data were available for 74 patients with a mean age of 65.3±12.4 years. The shrinkage values for the six Impairment Inventory (II) dimensions varied from −0.05 to 0.09; the shrinkage value for the Activity Inventory (AI) was 0.21. The error associated with predictive values was greater than ±1.5 stages for the II dimensions and greater than ±24 points for the AI. Conclusions: This study shows that the large error associated with the predictions (as defined by the confidence band) for the CMSA II and AI limits their clinical usefulness as a predictive measure. Further research to establish predictive models using alternative statistical procedures is warranted. PMID:22654239
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Ground Motion Prediction Equations for Western Saudi Arabia from a Reference Model
Kiuchi, R.; Mooney, W. D.; Mori, J. J.; Zahran, H. M.; Al-Raddadi, W.; Youssef, S.
2017-12-01
Western Saudi Arabia is surrounded by several active seismic zones such as the Red Sea and the Gulf of Aqaba where a destructive magnitude 7.3 event occurred in 1995. Over the last decade, the Saudi Geological Survey (SGS) has deployed a dense seismic network that has made it possible to monitor seismic activity more accurately. For example, the network has detected multiple seismic swarms beneath the volcanic fields in western Saudi Arabia. The most recent damaging event was a M5.7 earthquake that occurred in 2009 at Harrat Lunayyir. In terms of seismic hazard assessment, Zahran et al. (2015; 2016) presented a Probabilistic Seismic Hazard Assessment (PSHA) for western Saudi Arabia that was developed using published Ground Motion Prediction Equations (GMPEs) from areas outside of Saudi Arabia. In this study, we consider 41 earthquakes of M 3.0 - 5.4, recorded on 124 stations of the SGS network, to create a set of 442 peak ground acceleration (PGA) and peak ground velocity (PGV) records with a range of epicentral distances from 3 km to 400 km. We use the GMPE model BSSA14 (Boore et al., 2014) as a reference model to estimate our own best-fitting coefficients from a regression analysis using the events occurred in western Saudi Arabia. For epicentral distances less than 100 km, our best fitting model has different source scaling in comparison with the GMPE of BSSA14 adjusted for the California region. In addition, our model indicates that the peak amplitudes have less attenuation in western Saudi Arabia than in California.
A Predictive Logistic Regression Model of World Conflict Using Open Source Data
2015-03-26
No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
GIS-based spatial regression and prediction of water quality in river networks: A case study in Iowa
Yang, X.; Jin, W.
2010-01-01
Nonpoint source pollution is the leading cause of the U.S.'s water quality problems. One important component of nonpoint source pollution control is an understanding of what and how watershed-scale conditions influence ambient water quality. This paper investigated the use of spatial regression to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration in the Cedar River Watershed, Iowa. An Arc Hydro geodatabase was constructed to organize various datasets on the watershed. Spatial regression models were developed to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration and predict NO3NO2-N concentration at unmonitored locations. Unlike the traditional ordinary least square (OLS) method, the spatial regression method incorporates the potential spatial correlation among the observations in its coefficient estimation. Study results show that NO3NO2-N observations in the Cedar River Watershed are spatially correlated, and by ignoring the spatial correlation, the OLS method tends to over-estimate the impacts of watershed characteristics on stream NO3NO2-N concentration. In conjunction with kriging, the spatial regression method not only makes better stream NO3NO2-N concentration predictions than the OLS method, but also gives estimates of the uncertainty of the predictions, which provides useful information for optimizing the design of stream monitoring network. It is a promising tool for better managing and controlling nonpoint source pollution. ?? 2010 Elsevier Ltd.
Directory of Open Access Journals (Sweden)
Luiz Fernando Novack
2014-12-01
Full Text Available This study analyzed classical and developed novel mathematical models to predict body fat percentage (%BF in professional soccer players from the South Brazilian region using skinfold thicknesses measurement. Skinfolds of thirty one male professional soccer players (age of 21.48 ± 3.38 years, body mass of 79.05 ± 9.48 kg and height of 181.97 ± 8.11 cm were introduced into eight mathematical models from the literature for the prediction of %BF; these results were then compared to Dual-energy X-ray Absorptiometry (DXA. The classical equations were able to account from 65% to 79% of the variation of %BF in DXA. Statistical differences between most of the classical equations (seven of the eight classic equations and DXA were found, rendering their widespread use in this population useless. We developed three new equations for prediction of %BF with skinfolds from: axils, abdomen, thighs and calves. Theses equations accounted for 86.5% of the variation in %BF obtained with DXA.
Qiu, Shibin; Lane, Terran
2009-01-01
The cell defense mechanism of RNA interference has applications in gene function analysis and promising potentials in human disease therapy. To effectively silence a target gene, it is desirable to select appropriate initiator siRNA molecules having satisfactory silencing capabilities. Computational prediction for silencing efficacy of siRNAs can assist this screening process before using them in biological experiments. String kernel functions, which operate directly on the string objects representing siRNAs and target mRNAs, have been applied to support vector regression for the prediction and improved accuracy over numerical kernels in multidimensional vector spaces constructed from descriptors of siRNA design rules. To fully utilize information provided by string and numerical data, we propose to unify the two in a kernel feature space by devising a multiple kernel regression framework where a linear combination of the kernels is used. We formulate the multiple kernel learning into a quadratically constrained quadratic programming (QCQP) problem, which although yields global optimal solution, is computationally demanding and requires a commercial solver package. We further propose three heuristics based on the principle of kernel-target alignment and predictive accuracy. Empirical results demonstrate that multiple kernel regression can improve accuracy, decrease model complexity by reducing the number of support vectors, and speed up computational performance dramatically. In addition, multiple kernel regression evaluates the importance of constituent kernels, which for the siRNA efficacy prediction problem, compares the relative significance of the design rules. Finally, we give insights into the multiple kernel regression mechanism and point out possible extensions.
CSIR Research Space (South Africa)
Kirton, A
2010-08-01
Full Text Available for calculating the variance and prediction intervals for biomass estimates obtained from allometric equations A KIRTON B SCHOLES S ARCHIBALD CSIR Ecosystem Processes and Dynamics, Natural Resources and the Environment P.O. BOX 395, Pretoria, 0001, South... intervals (confidence intervals for predicted values) for allometric estimates can be obtained using an example of estimating tree biomass from stem diameter. It explains how to deal with relationships which are in the power function form - a common form...
International Nuclear Information System (INIS)
Arsenault, Louis-François; Millis, Andrew J; Neuberg, Richard; Hannah, Lauren A
2017-01-01
We present a supervised machine learning approach to the inversion of Fredholm integrals of the first kind as they arise, for example, in the analytic continuation problem of quantum many-body physics. The approach provides a natural regularization for the ill-conditioned inverse of the Fredholm kernel, as well as an efficient and stable treatment of constraints. The key observation is that the stability of the forward problem permits the construction of a large database of outputs for physically meaningful inputs. Applying machine learning to this database generates a regression function of controlled complexity, which returns approximate solutions for previously unseen inputs; the approximate solutions are then projected onto the subspace of functions satisfying relevant constraints. Under standard error metrics the method performs as well or better than the Maximum Entropy method for low input noise and is substantially more robust to increased input noise. We suggest that the methodology will be similarly effective for other problems involving a formally ill-conditioned inversion of an integral operator, provided that the forward problem can be efficiently solved. (paper)
Lazaridis, D.C.; Verbesselt, J.; Robinson, A.P.
2011-01-01
Constructing models can be complicated when the available fitting data are highly correlated and of high dimension. However, the complications depend on whether the goal is prediction instead of estimation. We focus on predicting tree mortality (measured as the number of dead trees) from change
Sun, Bo; Liu, Yu; Li, Jing Xian; Li, Haipeng; Chen, Peijie
2013-01-01
Purpose: This study set out to examine the relationship between step frequency and velocity to develop a step frequency-based equation to predict Chinese youth's energy expenditure (EE) during walking and running. Method: A total of 173 boys and girls aged 11 to 18 years old participated in this study. The participants walked and ran on a…
Development of equations for predicting Puerto Rican subtropical dry forest biomass and volume
Thomas J. Brandeis; Matthew Delaney; Bernard R. Parresol; Larry Royer
2006-01-01
Carbon accounting, forest health monitoring and sustainable management of the subtropical dry forests of Puerto Rico and other Caribbean Islands require an accurate assessment of forest aboveground biomass (AGB) and stem volume. One means of improving assessment accuracy is the development of predictive equations derived from locally collected data. Forest inventory...
International Nuclear Information System (INIS)
Abdolmaleki, P.; Yarmohammadi, M.; Gity, M.
2004-01-01
Background: We designed an algorithmic model based on regression analysis and a non-algorithmic model based on the Artificial Neural Network. Materials and methods: The ability of these models was compared together in clinical application to differentiate malignant from benign breast tumors in a study group of 161 patient's records. Each patient's record consisted of 6 subjective features extracted from MRI appearance. These findings were enclosed as features extracted for an Artificial Neural Network as well as a logistic regression model to predict biopsy outcome. After both models had been trained perfectly on samples (n=100), the validation samples (n=61) were presented to the trained network as well as the established logistic regression models. Finally, the diagnostic performance of models were compared to the that of the radiologist in terms of sensitivity, specificity and accuracy, using receiver operating characteristic curve analysis. Results: The average out put of the Artificial Neural Network yielded a perfect sensitivity (98%) and high accuracy (90%) similar to that one of an expert radiologist (96% and 92%) while specificity was smaller than that (67%) verses 80%). The output of the logistic regression model using significant features showed improvement in specificity from 60% for the logistic regression model using all features to 93% for the reduced logistic regression model, keeping the accuracy around 90%. Conclusion: Results show that Artificial Neural Network and logistic regression model prove the relationship between extracted morphological features and biopsy results. Using statistically significant variables reduced logistic regression model outperformed of Artificial Neural Network with remarkable specificity while keeping high sensitivity is achieved
Directory of Open Access Journals (Sweden)
Pełka Paweł
2017-01-01
Full Text Available Electricity demand forecasting is of important role in power system planning and operation. In this work, fuzzy nearest neighbour regression has been utilised to estimate monthly electricity demands. The forecasting model was based on the pre-processed energy consumption time series, where input and output variables were defined as patterns representing unified fragments of the time series. Relationships between inputs and outputs, which were simplified due to patterns, were modelled using nonparametric regression with weighting function defined as a fuzzy membership of learning points to the neighbourhood of a query point. In an experimental part of the work the model was evaluated using real-world data. The results are encouraging and show high performances of the model and its competitiveness compared to other forecasting models.
Water demand prediction using artificial neural networks and support vector regression
CSIR Research Space (South Africa)
Msiza, IS
2008-11-01
Full Text Available Neural Networks and Support Vector Regression Ishmael S. Msiza1, Fulufhelo V. Nelwamondo1,2, Tshilidzi Marwala3 . 1Modelling and Digital Science, CSIR, Johannesburg,SOUTH AFRICA 2Graduate School of Arts and Sciences, Harvard University, Cambridge..., Massachusetts, USA 3School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, SOUTH AFRICA Email: imsiza@csir.co.za, nelwamon@fas.harvard.edu, tshilidzi.marwala@wits.ac.za Abstract— Computational Intelligence techniques...
Geroukis, Asterios; Brorson, Erik
2014-01-01
In this study, we compare the two statistical techniques logistic regression and discriminant analysis to see how well they classify companies based on clusters – made from the solvency ratio – using principal components as independent variables. The principal components are made with different financial ratios. We use cluster analysis to find groups with low, medium and high solvency ratio of 1200 different companies found on the NASDAQ stock market and use this as an apriori definition of ...
Energy Technology Data Exchange (ETDEWEB)
Yoon, Hee Mang; Jung, Ah Young; Cho, Young Ah; Yoon, Chong Hyun; Lee, Jin Seong [Asan Medical Center Children' s Hospital, University of Ulsan College of Medicine, Department of Radiology and Research Institute of Radiology, Songpa-gu, Seoul (Korea, Republic of); Kim, Ellen Ai-Rhan [University of Ulsan College of Medicine, Division of Neonatology, Asan Medical Center Children' s Hospital, Seoul (Korea, Republic of); Chung, Sung-Hoon [Kyung Hee University School of Medicine, Department of Pediatrics, Seoul (Korea, Republic of); Kim, Seon-Ok [Asan Medical Center, Department of Clinical Epidemiology and Biostatistics, Seoul (Korea, Republic of)
2017-06-15
To describe the natural course of extralobar pulmonary sequestration (EPS) and identify factors associated with spontaneous regression of EPS. We retrospectively searched for patients diagnosed with EPS on initial contrast CT scan within 1 month after birth and had a follow-up CT scan without treatment. Spontaneous regression of EPS was assessed by percentage decrease in volume (PDV) and percentage decrease in sum of the diameter of systemic feeding arteries (PDD) by comparing initial and follow-up CT scans. Clinical and CT features were analysed to determine factors associated with PDV and PDD rates. Fifty-one neonates were included. The cumulative proportions of patients reaching PDV > 50 % and PDD > 50 % were 93.0 % and 73.3 % at 4 years, respectively. Tissue attenuation was significantly associated with PDV rate (B = -21.78, P <.001). The tissue attenuation (B = -22.62, P =.001) and diameter of the largest systemic feeding arteries (B = -48.31, P =.011) were significant factors associated with PDD rate. The volume and diameter of systemic feeding arteries of EPS spontaneously decreased within 4 years without treatment. EPSs showing a low tissue attenuation and small diameter of the largest systemic feeding arteries on initial contrast-enhanced CT scans were likely to regress spontaneously. (orig.)
Seasonal prediction of winter extreme precipitation over Canada by support vector regression
Directory of Open Access Journals (Sweden)
Z. Zeng
2011-01-01
Full Text Available For forecasting the maximum 5-day accumulated precipitation over the winter season at lead times of 3, 6, 9 and 12 months over Canada from 1950 to 2007, two nonlinear and two linear regression models were used, where the models were support vector regression (SVR (nonlinear and linear versions, nonlinear Bayesian neural network (BNN and multiple linear regression (MLR. The 118 stations were grouped into six geographic regions by K-means clustering. For each region, the leading principal components of the winter maximum 5-d accumulated precipitation anomalies were the predictands. Potential predictors included quasi-global sea surface temperature anomalies and 500 hPa geopotential height anomalies over the Northern Hemisphere, as well as six climate indices (the Niño-3.4 region sea surface temperature, the North Atlantic Oscillation, the Pacific-North American teleconnection, the Pacific Decadal Oscillation, the Scandinavia pattern, and the East Atlantic pattern. The results showed that in general the two robust SVR models tended to have better forecast skills than the two non-robust models (MLR and BNN, and the nonlinear SVR model tended to forecast slightly better than the linear SVR model. Among the six regions, the Prairies region displayed the highest forecast skills, and the Arctic region the second highest. The strongest nonlinearity was manifested over the Prairies and the weakest nonlinearity over the Arctic.
A linear regression model for predicting PNW estuarine temperatures in a changing climate
Pacific Northwest coastal regions, estuaries, and associated ecosystems are vulnerable to the potential effects of climate change, especially to changes in nearshore water temperature. While predictive climate models simulate future air temperatures, no such projections exist for...
Directory of Open Access Journals (Sweden)
José Gilson Louzada Regadas Filho
2011-09-01
Full Text Available This study aimed at estimating the kinetic parameters of ruminal degradation of neutral detergent fiber from agroindustrial byproducts of cashew (pulp and cashew nut, passion fruit, melon, pineapple, West Indian cherry, grape, annatto and coconut through the gravimetric technique of nylon bag, and to evaluate the prediction equation of indigestible fraction of neutral detergent fiber suggested by the Cornell Net Carbohydrate and Protein System. Samples of feed crushed to 2 mm were placed in 7 × 14 cm nylon bags with porosity of 50 µm in a ratio of 20 g DM/cm² and incubated in duplicate in the rumen of a heifer at 0, 3, 6, 9, 12, 16, 24, 36, 48, 72, 96 and 144 hours. The incubation residues were analyzed for NDF content and evaluated by a non-linear logistic model. The evaluation process of predicting the indigestible fraction of NDF was carried out through adjustment of linear regression models between predicted and observed values. There was a wide variation in the degradation parameters of NDF among byproducts. The degradation rate of NDF ranged from 0.0267 h-1 to 0.0971 h-1 for grape and West Indian cherry, respectively. The potentially digestible fraction of NDF ranged from 4.17 to 90.67%, respectively, for melon and coconut byproducts. The CNCPS equation was sensitive to predict the indigestible fraction of neutral detergent fiber of the byproducts. However, due to the high value of the mean squared error of prediction, such estimates are very variable; hence the most suitable would be estimation by biological methods.
International Nuclear Information System (INIS)
Watanabe, Nami; Sugai Yukio; Komatani, Akio; Yamaguchi, Koichi; Takahashi, Kazuei
1999-01-01
This study was designed to investigate the empirical tubular extraction rate (TER) of the normal renal function in childhood and then propose a new equation to obtain TER theoretically. The empirical TER was calculated using Russell's method for determination of single-sample plasma clearance and 99m Tc-MAG 3 in 40 patients with renal disease younger than 10 years of age who were classified as having normal renal function using diagnostic criteria defined by the Paediatric Task Group of EANM. First, we investigated the relationships of the empirical value of absolute TER to age, body weight, body surface area (BSA) and distribution volume. Next we investigated the relationships of the empirical value of BSA corrected TER to age, body weight, BSA and distribution volume. Linear relationship was indicated between the absolute TER and each body dimensional factors, especially regarding to BSA, its correlation coefficient was 0.90 (p value). The BSA-corrected TER showed a logarithmic relationship with BSA, but linear regression did not show any significant correlation. Therefore, it was thought that the normal value of TER could be calculated theoretically using the body surface area, and here we proposed the following linear regression equation; Theoretical TER (ml/min/1.73 m 2 )=(-39.8+257.2 x BSA)/BSA/1.73. The theoretical TER could be one of the reference values of the renal function in the period of the renal maturation. (author)
A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree
Suresh Ramakrishnan; Maryam Mirzaei; Mahmoud Bekri
2015-01-01
The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector in...
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Konheim, Jeremy A; Kon, Zachary N; Pasrija, Chetan; Luo, Qingyang; Sanchez, Pablo G; Garcia, Jose P; Griffith, Bartley P; Jeudy, Jean
2016-04-01
Size matching for lung transplantation is widely accomplished using height comparisons between donors and recipients. This gross approximation allows for wide variation in lung size and, potentially, size mismatch. Three-dimensional computed tomography (3D-CT) volumetry comparisons could offer more accurate size matching. Although recipient CT scans are universally available, donor CT scans are rarely performed. Therefore, predicted donor lung volumes could be used for comparison to measured recipient lung volumes, but no such predictive equations exist. We aimed to use 3D-CT volumetry measurements from a normal patient population to generate equations for predicted total lung volume (pTLV), predicted right lung volume (pRLV), and predicted left lung volume (pLLV), for size-matching purposes. Chest CT scans of 400 normal patients were retrospectively evaluated. 3D-CT volumetry was performed to measure total lung volume, right lung volume, and left lung volume of each patient, and predictive equations were generated. The fitted model was tested in a separate group of 100 patients. The model was externally validated by comparison of total lung volume with total lung capacity from pulmonary function tests in a subset of those patients. Age, gender, height, and race were independent predictors of lung volume. In the test group, there were strong linear correlations between predicted and actual lung volumes measured by 3D-CT volumetry for pTLV (r = 0.72), pRLV (r = 0.72), and pLLV (r = 0.69). A strong linear correlation was also observed when comparing pTLV and total lung capacity (r = 0.82). We successfully created a predictive model for pTLV, pRLV, and pLLV. These may serve as reference standards and predict donor lung volume for size matching in lung transplantation. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
Southard, Rodney E.
2013-01-01
estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were
International Nuclear Information System (INIS)
Chen, Kuilin; Yu, Jie
2014-01-01
Highlights: • A novel hybrid modeling method is proposed for short-term wind speed forecasting. • Support vector regression model is constructed to formulate nonlinear state-space framework. • Unscented Kalman filter is adopted to recursively update states under random uncertainty. • The new SVR–UKF approach is compared to several conventional methods for short-term wind speed prediction. • The proposed method demonstrates higher prediction accuracy and reliability. - Abstract: Accurate wind speed forecasting is becoming increasingly important to improve and optimize renewable wind power generation. Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation. However, this task remains challenging due to the strong stochastic nature and dynamic uncertainty of wind speed. In this study, unscented Kalman filter (UKF) is integrated with support vector regression (SVR) based state-space model in order to precisely update the short-term estimation of wind speed sequence. In the proposed SVR–UKF approach, support vector regression is first employed to formulate a nonlinear state-space model and then unscented Kalman filter is adopted to perform dynamic state estimation recursively on wind sequence with stochastic uncertainty. The novel SVR–UKF method is compared with artificial neural networks (ANNs), SVR, autoregressive (AR) and autoregressive integrated with Kalman filter (AR-Kalman) approaches for predicting short-term wind speed sequences collected from three sites in Massachusetts, USA. The forecasting results indicate that the proposed method has much better performance in both one-step-ahead and multi-step-ahead wind speed predictions than the other approaches across all the locations
International Nuclear Information System (INIS)
Sun Zhong-Hua; Jiang Fan
2010-01-01
In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method. (rapid communication)
Energy Technology Data Exchange (ETDEWEB)
Costa da Silva, Adilson; Carvalho da Silva, Fernando [COPPE/UFRJ, Programa de Engenharia Nuclear, Caixa Postal 68509, 21941-914, Rio de Janeiro (Brazil); Senra Martinez, Aquilino, E-mail: aquilino@lmp.ufrj.br [COPPE/UFRJ, Programa de Engenharia Nuclear, Caixa Postal 68509, 21941-914, Rio de Janeiro (Brazil)
2011-07-15
Highlights: > We proposed a new neutron diffusion hybrid equation with external neutron source. > A coarse mesh finite difference method for the adjoint flux and reactivity calculation was developed. > 1/M curve to predict the criticality condition is used. - Abstract: We used the neutron diffusion hybrid equation, in cartesian geometry with external neutron sources to predict the subcritical multiplication of neutrons in a pressurized water reactor, using a 1/M curve to predict the criticality condition. A Coarse Mesh Finite Difference Method was developed for the adjoint flux calculation and to obtain the reactivity values of the reactor. The results obtained were compared with benchmark values in order to validate the methodology presented in this paper.
International Nuclear Information System (INIS)
Costa da Silva, Adilson; Carvalho da Silva, Fernando; Senra Martinez, Aquilino
2011-01-01
Highlights: → We proposed a new neutron diffusion hybrid equation with external neutron source. → A coarse mesh finite difference method for the adjoint flux and reactivity calculation was developed. → 1/M curve to predict the criticality condition is used. - Abstract: We used the neutron diffusion hybrid equation, in cartesian geometry with external neutron sources to predict the subcritical multiplication of neutrons in a pressurized water reactor, using a 1/M curve to predict the criticality condition. A Coarse Mesh Finite Difference Method was developed for the adjoint flux calculation and to obtain the reactivity values of the reactor. The results obtained were compared with benchmark values in order to validate the methodology presented in this paper.
Ortiz-Hernández, Luis; Vega López, A Valeria; Ramos-Ibáñez, Norma; Cázares Lara, L Joana; Medina Gómez, R Joab; Pérez-Salgado, Diana
To develop and validate equations to estimate the percentage of body fat of children and adolescents from Mexico using anthropometric measurements. A cross-sectional study was carried out with 601 children and adolescents from Mexico aged 5-19 years. The participants were randomly divided into the following two groups: the development sample (n=398) and the validation sample (n=203). The validity of previously published equations (e.g., Slaughter) was also assessed. The percentage of body fat was estimated by dual-energy X-ray absorptiometry. The anthropometric measurements included height, sitting height, weight, waist and arm circumferences, skinfolds (triceps, biceps, subscapular, supra-iliac, and calf), and elbow and bitrochanteric breadth. Linear regression models were estimated with the percentage of body fat as the dependent variable and the anthropometric measurements as the independent variables. Equations were created based on combinations of six to nine anthropometric variables and had coefficients of determination (r 2 ) equal to or higher than 92.4% for boys and 85.8% for girls. In the validation sample, the developed equations had high r 2 values (≥85.6% in boys and ≥78.1% in girls) in all age groups, low standard errors (SE≤3.05% in boys and ≤3.52% in girls), and the intercepts were not different from the origin (p>0.050). Using the previously published equations, the coefficients of determination were lower, and/or the intercepts were different from the origin. The equations developed in this study can be used to assess the percentage of body fat of Mexican schoolchildren and adolescents, as they demonstrate greater validity and lower error compared with previously published equations. Copyright © 2017 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.
Directory of Open Access Journals (Sweden)
Rachid Darnag
2017-02-01
Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.
Regressive Prediction Approach to Vertical Handover in Fourth Generation Wireless Networks
Directory of Open Access Journals (Sweden)
Abubakar M. Miyim
2014-11-01
Full Text Available The over increasing demand for deployment of wireless access networks has made wireless mobile devices to face so many challenges in choosing the best suitable network from a set of available access networks. Some of the weighty issues in 4G wireless networks are fastness and seamlessness in handover process. This paper therefore, proposes a handover technique based on movement prediction in wireless mobile (WiMAX and LTE-A environment. The technique enables the system to predict signal quality between the UE and Radio Base Stations (RBS/Access Points (APs in two different networks. Prediction is achieved by employing the Markov Decision Process Model (MDPM where the movement of the UE is dynamically estimated and averaged to keep track of the signal strength of mobile users. With the help of the prediction, layer-3 handover activities are able to occur prior to layer-2 handover, and therefore, total handover latency can be reduced. The performances of various handover approaches influenced by different metrics (mobility velocities were evaluated. The results presented demonstrate good accuracy the proposed method was able to achieve in predicting the next signal level by reducing the total handover latency.
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Achamrah, Najate; Jésus, Pierre; Grigioni, Sébastien; Rimbert, Agnès; Petit, André; Déchelotte, Pierre; Folope, Vanessa; Coëffier, Moïse
2018-01-01
Predictive equations have been specifically developed for obese patients to estimate resting energy expenditure (REE). Body composition (BC) assessment is needed for some of these equations. We assessed the impact of BC methods on the accuracy of specific predictive equations developed in obese patients. REE was measured (mREE) by indirect calorimetry and BC assessed by bioelectrical impedance analysis (BIA) and dual-energy X-ray absorptiometry (DXA). mREE, percentages of prediction accuracy (±10% of mREE) were compared. Predictive equations were studied in 2588 obese patients. Mean mREE was 1788 ± 6.3 kcal/24 h. Only the Müller (BIA) and Harris & Benedict (HB) equations provided REE with no difference from mREE. The Huang, Müller, Horie-Waitzberg, and HB formulas provided a higher accurate prediction (>60% of cases). The use of BIA provided better predictions of REE than DXA for the Huang and Müller equations. Inversely, the Horie-Waitzberg and Lazzer formulas provided a higher accuracy using DXA. Accuracy decreased when applied to patients with BMI ≥ 40, except for the Horie-Waitzberg and Lazzer (DXA) formulas. Müller equations based on BIA provided a marked improvement of REE prediction accuracy than equations not based on BC. The interest of BC to improve REE predictive equations accuracy in obese patients should be confirmed. PMID:29320432
Added value of pharmacogenetic testing in predicting statin response: Results from the REGRESS trial
Van Der Baan, F.H.; Knol, M.J.; Maitland-Van Der Zee, A.H.; Regieli, J.J.; Van Iperen, E.P.A.; Egberts, A.C.G.; Klungel, O.H.; Grobbee, D.E.; Jukema, J.W.
2013-01-01
It was investigated whether pharmacogenetic factors, both as single polymorphism and as gene-gene interactions, have an added value over non-genetic factors in predicting statin response. Five common polymorphisms were selected in apolipoprotein E, angiotensin-converting enzyme, hepatic lipase and
Greg C. Liknes; Christopher W. Woodall; Charles H. Perry
2009-01-01
Climate information frequently is included in geospatial modeling efforts to improve the predictive capability of other data sources. The selection of an appropriate climate data source requires consideration given the number of choices available. With regard to climate data, there are a variety of parameters (e.g., temperature, humidity, precipitation), time intervals...
The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs
DEFF Research Database (Denmark)
Cadavez, Vasco A. P.; Henningsen, Arne
The aim of this study was to develop and evaluate models for predicting the carcass composition of lambs. Forty male lambs of two different breeds were included in our analysis. The lambs were slaughtered and their hot carcass weight was obtained. After cooling for 24 hours, the subcutaneous fat...
The use of seemingly unrelated regression to predict the carcass composition of lambs
DEFF Research Database (Denmark)
Cadavez, V.A.P.; Henningsen, Arne
2012-01-01
The aim of this study was to develop and evaluate models for predicting the carcass composition of lambs. Forty male lambs were slaughtered and their carcasses were cooled for 24 hours. The subcutaneous fat thickness was measured between the 12th and 13th rib and breast bone tissue thickness...
Ellis, J.; Ysebaert, T.; Hume, T.; Norkko, A.; Bult, T.; Herman, P.M.J.; Thrush, S.; Oldman, J.
2006-01-01
There is a growing need to predict ecological responses to long-term habitat change. However, statistical models for marine soft-substratum ecosystems are limited, and consequently there is a need for the development of such models. In order to assess the utility of statistical modelling approaches
International Nuclear Information System (INIS)
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien
2015-01-01
Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics
Directory of Open Access Journals (Sweden)
Bharat Singh
2014-11-01
Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.
Christiansen, Bo
2015-04-01
Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E
2017-07-01
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.
Energy Technology Data Exchange (ETDEWEB)
Schlattmann, Peter [University Hospital of Friedrich-Schiller University Jena, Department of Medical Statistics, Informatics and Documentation, Jena (Germany); Schuetz, Georg M. [Freie Universitaet Berlin, Charite, Medical School, Department of Radiology, Humboldt-Universitaet zu Berlin, Berlin (Germany); Dewey, Marc [Freie Universitaet Berlin, Charite, Medical School, Department of Radiology, Humboldt-Universitaet zu Berlin, Berlin (Germany); Charite, Institut fuer Radiologie, Berlin (Germany)
2011-09-15
To evaluate the impact of coronary artery disease (CAD) prevalence on the predictive values of coronary CT angiography. We performed a meta-regression based on a generalised linear mixed model using the binomial distribution and a logit link to analyse the influence of the prevalence of CAD in published studies on the per-patient negative and positive predictive values of CT in comparison to conventional coronary angiography as the reference standard. A prevalence range in which the negative predictive value was higher than 90%, while at the same time the positive predictive value was higher than 70% was considered appropriate. The summary negative and positive predictive values of coronary CT angiography were 93.7% (95% confidence interval [CI] 92.8-94.5%) and 87.5% (95% CI, 86.5-88.5%), respectively. With 95% confidence, negative and positive predictive values higher than 90% and 70% were available with CT for a CAD prevalence of 18-63%. CT systems with >16 detector rows met these requirements for the positive (P < 0.01) and negative (P < 0.05) predictive values in a significantly broader range than systems with {<=}16 detector rows. It is reasonable to perform coronary CT angiography as a rule-out test in patients with a low-to-intermediate likelihood of disease. (orig.)
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
International Nuclear Information System (INIS)
Schlattmann, Peter; Schuetz, Georg M.; Dewey, Marc
2011-01-01
To evaluate the impact of coronary artery disease (CAD) prevalence on the predictive values of coronary CT angiography. We performed a meta-regression based on a generalised linear mixed model using the binomial distribution and a logit link to analyse the influence of the prevalence of CAD in published studies on the per-patient negative and positive predictive values of CT in comparison to conventional coronary angiography as the reference standard. A prevalence range in which the negative predictive value was higher than 90%, while at the same time the positive predictive value was higher than 70% was considered appropriate. The summary negative and positive predictive values of coronary CT angiography were 93.7% (95% confidence interval [CI] 92.8-94.5%) and 87.5% (95% CI, 86.5-88.5%), respectively. With 95% confidence, negative and positive predictive values higher than 90% and 70% were available with CT for a CAD prevalence of 18-63%. CT systems with >16 detector rows met these requirements for the positive (P < 0.01) and negative (P < 0.05) predictive values in a significantly broader range than systems with ≤16 detector rows. It is reasonable to perform coronary CT angiography as a rule-out test in patients with a low-to-intermediate likelihood of disease. (orig.)
Creep-fatigue life prediction method using Diercks equation for Cr-Mo steel
International Nuclear Information System (INIS)
Sonoya, Keiji; Nonaka, Isamu; Kitagawa, Masaki
1990-01-01
For dealing with the situation that creep-fatigue life properties of materials do not exist, a development of the simple method to predict creep-fatigue life properties is necessary. A method to predict the creep-fatigue life properties of Cr-Mo steels is proposed on the basis of D. Diercks equation which correlates the creep-fatigue lifes of SUS 304 steels under various temperatures, strain ranges, strain rates and hold times. The accuracy of the proposed method was compared with that of the existing methods. The following results were obtained. (1) Fatigue strength and creep rupture strength of Cr-Mo steel are different from those of SUS 304 steel. Therefore in order to apply Diercks equation to creep-fatigue prediction for Cr-Mo steel, the difference of fatigue strength was found to be corrected by fatigue life ratio of both steels and the difference of creep rupture strength was found to be corrected by the equivalent temperature corresponding to equal strength of both steels. (2) Creep-fatigue life can be predicted by the modified Diercks equation within a factor of 2 which is nearly as precise as the accuracy of strain range partitioning method. Required test and analysis procedure of this method are not so complicated as strain range partitioning method. (author)
Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee
2016-02-01
The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Yu, Wenbao; Park, Taesung
2014-01-01
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
Directory of Open Access Journals (Sweden)
Saro Lee
2016-02-01
Full Text Available The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS. These factors were analysed using artificial neural network (ANN and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50% and a test set (50%. A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10% was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%. Of the weights used in the artificial neural network model, ‘slope’ yielded the highest weight value (1.330, and ‘aspect’ yielded the lowest value (1.000. This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
PREDICTION OF CORPORATE BANKRUPTCY IN ROMANIA THROUGH THE USE OF LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Brindescu-Olariu Daniel
2013-07-01
As theoretical contributions, the research proves that the companies that filed for bankruptcy during the crisis period showed signs of weaknesses before the beginning of the crisis. Financial ratios that show relevance in the prediction of corporate bankruptcy at local level have been identified and their correlation with the bankruptcy probability has been evaluated. The model is expected to maintain its accuracy with minimal or no additional calibration for companies from the entire Romanian economy that fit the profile of the target population.
Bita Najafian; Aminsaburi Aminsaburi; Seyyed Hassan Fakhraei; Abolfazl afjeh; Fatemeh Eghbal; Reza Noroozian
2015-01-01
Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS. Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure). The patients were divided into failu...
ENHANCED PREDICTION OF STUDENT DROPOUTS USING FUZZY INFERENCE SYSTEM AND LOGISTIC REGRESSION
A. Saranya; J. Rajeswari
2016-01-01
Predicting college and school dropouts is a major problem in educational system and has complicated challenge due to data imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout students using neural based classification algorithm and different mining technique are implemented for dat...
Modeling and Predicting AD Progression by Regression Analysis of Sequential Clinical Data
Xie, Qing
2016-02-23
Alzheimer\\'s Disease (AD) is currently attracting much attention in elders\\' care. As the increasing availability of massive clinical diagnosis data, especially the medical images of brain scan, it is highly significant to precisely identify and predict the potential AD\\'s progression based on the knowledge in the diagnosis data. In this paper, we follow a novel sequential learning framework to model the disease progression for AD patients\\' care. Different from the conventional approaches using only initial or static diagnosis data to model the disease progression for different durations, we design a score-involved approach and make use of the sequential diagnosis information in different disease stages to jointly simulate the disease progression. The actual clinical scores are utilized in progress to make the prediction more pertinent and reliable. We examined our approach by extensive experiments on the clinical data provided by the Alzheimer\\'s Disease Neuroimaging Initiative (ADNI). The results indicate that the proposed approach is more effective to simulate and predict the disease progression compared with the existing methods.
Modeling and Predicting AD Progression by Regression Analysis of Sequential Clinical Data
Xie, Qing; Wang, Su; Zhu, Jia; Zhang, Xiangliang
2016-01-01
Alzheimer's Disease (AD) is currently attracting much attention in elders' care. As the increasing availability of massive clinical diagnosis data, especially the medical images of brain scan, it is highly significant to precisely identify and predict the potential AD's progression based on the knowledge in the diagnosis data. In this paper, we follow a novel sequential learning framework to model the disease progression for AD patients' care. Different from the conventional approaches using only initial or static diagnosis data to model the disease progression for different durations, we design a score-involved approach and make use of the sequential diagnosis information in different disease stages to jointly simulate the disease progression. The actual clinical scores are utilized in progress to make the prediction more pertinent and reliable. We examined our approach by extensive experiments on the clinical data provided by the Alzheimer's Disease Neuroimaging Initiative (ADNI). The results indicate that the proposed approach is more effective to simulate and predict the disease progression compared with the existing methods.
International Nuclear Information System (INIS)
Pinder, John E.; Rowan, David J.; Rasmussen, Joseph B.; Smith, Jim T.; Hinton, Thomas G.; Whicker, F.W.
2014-01-01
Data from published studies and World Wide Web sources were combined to produce and test a regression model to predict Cs concentration ratios for freshwater fish species. The accuracies of predicted concentration ratios, which were computed using 1) species trophic levels obtained from random resampling of known food items and 2) K concentrations in the water for 207 fish from 44 species and 43 locations, were tested against independent observations of ratios for 57 fish from 17 species from 25 locations. Accuracy was assessed as the percent of observed to predicted ratios within factors of 2 or 3. Conservatism, expressed as the lack of under prediction, was assessed as the percent of observed to predicted ratios that were less than 2 or less than 3. The model's median observed to predicted ratio was 1.26, which was not significantly different from 1, and 50% of the ratios were between 0.73 and 1.85. The percentages of ratios within factors of 2 or 3 were 67 and 82%, respectively. The percentages of ratios that were <2 or <3 were 79 and 88%, respectively. An example for Perca fluviatilis demonstrated that increased prediction accuracy could be obtained when more detailed knowledge of diet was available to estimate trophic level. - Highlights: • We developed a model to predict Cs concentration ratios for freshwater fish species. • The model uses only two variables to predict a species CR for any location. • One variable is the K concentration in the freshwater. • The other is a species mean trophic level measure easily obtained from (fishbase.org). • The median observed to predicted ratio for 57 independent test cases was 1.26
Validation of predictive equations for glomerular filtration rate in the Saudi population
Directory of Open Access Journals (Sweden)
Al Wakeel Jamal
2009-01-01
Full Text Available Predictive equations provide a rapid method of assessing glomerular filtration rate (GFR. To compare the various predictive equations for the measurement of this parameter in the Saudi population, we measured GFR by the Modification of Diet in Renal Disease (MDRD and Cockcroft-Gault formulas, cystatin C, reciprocal of cystatin C, creatinine clearance, reciprocal of creatinine, and inulin clearance in 32 Saudi subjects with different stages of renal disease. We com-pared GFR measured by inulin clearance and the estimated GFR by the equations. The study included 19 males (59.4% and 13 (40.6% females with a mean age of 42.3 ± 15.2 years and weight of 68.6 ± 17.7 kg. The mean serum creatinine was 199 ± 161 μmol/L. The GFR measured by inulin clearance was 50.9 ± 33.5 mL/min, and the estimated by Cockcroft-Gault and by MDRD equations was 56.3 ± 33.3 and 52.8 ± 32.0 mL/min, respectively. The GFR estimated by MDRD revealed the strongest correlation with the measured inulin clearance (r= 0.976, P= 0.0000 followed by the GFR estimated by Cockcroft-Gault, serum cystatin C, and serum creatinine (r= 0.953, P= 0.0000 (r= 0.787, P= 0.0001 (r= -0.678, P= 0.001, respectively. The reciprocal of cystatin C and serum creatinine revealed a correlation coefficient of 0.826 and 0.93, respectively. Cockroft-Gault for-mula overestimated the GFR by 5.40 ± 10.3 mL/min in comparison to the MDRD formula, which exhibited the best correlation with inulin clearance in different genders, age groups, body mass index, renal transplant recipients, chronic kidney disease stages when compared to other GFR predictive equations.
Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J
2014-08-27
State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.
Directory of Open Access Journals (Sweden)
Ceile Cristina Ferreira Nunes
2004-04-01
ítico calculada usando-se a expressão que leva em consideração a covariância entre e apresenta resultados mais satisfatórios e que não segue uma distribuição normal, pois apresenta uma distribuição de freqüência com assimetria positiva e formato leptocúrtico.The aim of this paper is determine variances for the analysis of the critical point of a second-degree regression equation in experimental situations with different variances through Monte Carlo simulation. In many theoretical or applied studies, one finds situations involving ratios of random variables and more frequently normal variables. Examples are provided by variables, which appear in economic dose research of nutrients in fertilization experiments, as well as in other problems in which there are interests in the random variable, estimator of the critic point in the regression . Data of five hundred thirty six trials in cotton yield were utilized to study the distribution of the critical point of a quadratic regression equation by adjusting a quadratic model. The parameters were evaluated using a least square method. From the estimations a MATLAB routine was implemented to simulate two sets with five thousands random errors with normal distribution and zero mean, relative to each of the theoretical variances: or = 0.1; 0.5; 1; 5; 10; 15; 20 and 50. The estimation of the variance of the critical point was obtained by three methods: (a usual formula for the variance; (b formula obtained by differentiation of the critical point estimator and (c formula for the computation of the variance of a quotient by taking into consideration the covariance between and . The results obtained for the statistic average for the regression between e , as well as its respective variances in terms of the several theoretical residual variances ( adopted show that those theoretical values are close to real ones. Moreover, there is a trend of increasing and with increase of the theoretical variance. It may
Alexandrowicz, Rainer W; Jahn, Rebecca; Friedrich, Fabian; Unger, Anne
2016-06-01
Various studies have shown that caregiving relatives of schizophrenic patients are at risk of suffering from depression. These studies differ with respect to the applied statistical methods, which could influence the findings. Therefore, the present study analyzes to which extent different methods may cause differing results. The present study contrasts by means of one data set the results of three different modelling approaches, Rasch Modelling (RM), Structural Equation Modelling (SEM), and Linear Regression Modelling (LRM). The results of the three models varied considerably, reflecting the different assumptions of the respective models. Latent trait models (i. e., RM and SEM) generally provide more convincing results by correcting for measurement error and the RM specifically proves superior for it treats ordered categorical data most adequately.
Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression
Directory of Open Access Journals (Sweden)
Alfonso L. Palmer
2010-01-01
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
Arqub, Omar Abu; El-Ajou, Ahmad; Momani, Shaher
2015-07-01
Building fractional mathematical models for specific phenomena and developing numerical or analytical solutions for these fractional mathematical models are crucial issues in mathematics, physics, and engineering. In this work, a new analytical technique for constructing and predicting solitary pattern solutions of time-fractional dispersive partial differential equations is proposed based on the generalized Taylor series formula and residual error function. The new approach provides solutions in the form of a rapidly convergent series with easily computable components using symbolic computation software. For method evaluation and validation, the proposed technique was applied to three different models and compared with some of the well-known methods. The resultant simulations clearly demonstrate the superiority and potentiality of the proposed technique in terms of the quality performance and accuracy of substructure preservation in the construct, as well as the prediction of solitary pattern solutions for time-fractional dispersive partial differential equations.
International Nuclear Information System (INIS)
Choudhary, M.R.; Mustafa, U.S.
2009-01-01
Field tests were conducted to calibrate the existing SCS design equation in determining field border length using field data of different field lengths during 2nd and 3rd irrigations under local conditions. A single ring infiltrometer was used to estimate the water movement into and through the irrigated soil profile and in estimating the coefficients of Kostiakov infiltration function. Measurements of the unit discharge and time of advance were carried out during different irrigations on wheat irrigated fields having clay loam soil. The collected field data were used to calibrate the existing SCS design equation developed by USDA for testing its validity under local field conditions. SCS equation was modified further to improve its applicability. Results from the study revealed that the Kostiakov model over predicted the coefficients, which in turn overestimated the water advance length for boarder in the selected field using existing SCS design equation. However, the calibrated SCS design equation after parametric modification produced more satisfactory results encouraging the scientists to make its use at larger scale. (author)
Hong, S-M; Bukhari, W
2014-07-07
The motion of thoracic and abdominal tumours induced by respiratory motion often exceeds 20 mm, and can significantly compromise dose conformality. Motion-adaptive radiotherapy aims to deliver a conformal dose distribution to the tumour with minimal normal tissue exposure by compensating for the tumour motion. This adaptive radiotherapy, however, requires the prediction of the tumour movement that can occur over the system latency period. In general, motion prediction approaches can be classified into two groups: model-based and model-free. Model-based approaches utilize a motion model in predicting respiratory motion. These approaches are computationally efficient and responsive to irregular changes in respiratory motion. Model-free approaches do not assume an explicit model of motion dynamics, and predict future positions by learning from previous observations. Artificial neural networks (ANNs) and support vector regression (SVR) are examples of model-free approaches. In this article, we present a prediction algorithm that combines a model-based and a model-free approach in a cascade structure. The algorithm, which we call EKF-SVR, first employs a model-based algorithm (named LCM-EKF) to predict the respiratory motion, and then uses a model-free SVR algorithm to estimate and correct the error of the LCM-EKF prediction. Extensive numerical experiments based on a large database of 304 respiratory motion traces are performed. The experimental results demonstrate that the EKF-SVR algorithm successfully reduces the prediction error of the LCM-EKF, and outperforms the model-free ANN and SVR algorithms in terms of prediction accuracy across lookahead lengths of 192, 384, and 576 ms.
International Nuclear Information System (INIS)
Hong, S-M; Bukhari, W
2014-01-01
The motion of thoracic and abdominal tumours induced by respiratory motion often exceeds 20 mm, and can significantly compromise dose conformality. Motion-adaptive radiotherapy aims to deliver a conformal dose distribution to the tumour with minimal normal tissue exposure by compensating for the tumour motion. This adaptive radiotherapy, however, requires the prediction of the tumour movement that can occur over the system latency period. In general, motion prediction approaches can be classified into two groups: model-based and model-free. Model-based approaches utilize a motion model in predicting respiratory motion. These approaches are computationally efficient and responsive to irregular changes in respiratory motion. Model-free approaches do not assume an explicit model of motion dynamics, and predict future positions by learning from previous observations. Artificial neural networks (ANNs) and support vector regression (SVR) are examples of model-free approaches. In this article, we present a prediction algorithm that combines a model-based and a model-free approach in a cascade structure. The algorithm, which we call EKF–SVR, first employs a model-based algorithm (named LCM–EKF) to predict the respiratory motion, and then uses a model-free SVR algorithm to estimate and correct the error of the LCM–EKF prediction. Extensive numerical experiments based on a large database of 304 respiratory motion traces are performed. The experimental results demonstrate that the EKF–SVR algorithm successfully reduces the prediction error of the LCM–EKF, and outperforms the model-free ANN and SVR algorithms in terms of prediction accuracy across lookahead lengths of 192, 384, and 576 ms. (paper)
A Regression Model for Predicting Shape Deformation after Breast Conserving Surgery
Directory of Open Access Journals (Sweden)
Hooshiar Zolfagharnasab
2018-01-01
Full Text Available Breast cancer treatments can have a negative impact on breast aesthetics, in case when surgery is intended to intersect tumor. For many years mastectomy was the only surgical option, but more recently breast conserving surgery (BCS has been promoted as a liable alternative to treat cancer while preserving most part of the breast. However, there is still a significant number of BCS intervened patients who are unpleasant with the result of the treatment, which leads to self-image issues and emotional overloads. Surgeons recognize the value of a tool to predict the breast shape after BCS to facilitate surgeon/patient communication and allow more educated decisions; however, no such tool is available that is suited for clinical usage. These tools could serve as a way of visually sensing the aesthetic consequences of the treatment. In this research, it is intended to propose a methodology for predict the deformation after BCS by using machine learning techniques. Nonetheless, there is no appropriate dataset containing breast data before and after surgery in order to train a learning model. Therefore, an in-house semi-synthetic dataset is proposed to fulfill the requirement of this research. Using the proposed dataset, several learning methodologies were investigated, and promising outcomes are obtained.
A Regression Model for Predicting Shape Deformation after Breast Conserving Surgery
Zolfagharnasab, Hooshiar; Bessa, Sílvia; Oliveira, Sara P.; Faria, Pedro; Teixeira, João F.; Cardoso, Jaime S.
2018-01-01
Breast cancer treatments can have a negative impact on breast aesthetics, in case when surgery is intended to intersect tumor. For many years mastectomy was the only surgical option, but more recently breast conserving surgery (BCS) has been promoted as a liable alternative to treat cancer while preserving most part of the breast. However, there is still a significant number of BCS intervened patients who are unpleasant with the result of the treatment, which leads to self-image issues and emotional overloads. Surgeons recognize the value of a tool to predict the breast shape after BCS to facilitate surgeon/patient communication and allow more educated decisions; however, no such tool is available that is suited for clinical usage. These tools could serve as a way of visually sensing the aesthetic consequences of the treatment. In this research, it is intended to propose a methodology for predict the deformation after BCS by using machine learning techniques. Nonetheless, there is no appropriate dataset containing breast data before and after surgery in order to train a learning model. Therefore, an in-house semi-synthetic dataset is proposed to fulfill the requirement of this research. Using the proposed dataset, several learning methodologies were investigated, and promising outcomes are obtained. PMID:29315279
A predictive group-contribution simplified PC-SAFT equation of state: Application to polymer systems
DEFF Research Database (Denmark)
Tihic, Amra; Kontogeorgis, Georgios; von Solms, Nicolas
2008-01-01
A group-contribution (GC) method is coupled with the molecular-based perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state (EoS) to predict its characteristic pure compound parameters. The estimation of group contributions for the parameters is based on a parameter...... are the molecular structure of the polymer of interest in terms of functional groups and a single binary interaction parameter for accurate mixture calculations....
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Directory of Open Access Journals (Sweden)
Paulino Pérez
2010-09-01
Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.
Lees, Mackenzie C.; Merani, Shaheed; Tauh, Keerit; Khadaroo, Rachel G.
2015-01-01
Background Older adults (≥ 65 yr) are the fastest growing population and are presenting in increasing numbers for acute surgical care. Emergency surgery is frequently life threatening for older patients. Our objective was to identify predictors of mortality and poor outcome among elderly patients undergoing emergency general surgery. Methods We conducted a retrospective cohort study of patients aged 65–80 years undergoing emergency general surgery between 2009 and 2010 at a tertiary care centre. Demographics, comorbidities, in-hospital complications, mortality and disposition characteristics of patients were collected. Logistic regression analysis was used to identify covariate-adjusted predictors of in-hospital mortality and discharge of patients home. Results Our analysis included 257 patients with a mean age of 72 years; 52% were men. In-hospital mortality was 12%. Mortality was associated with patients who had higher American Society of Anesthesiologists (ASA) class (odds ratio [OR] 3.85, 95% confidence interval [CI] 1.43–10.33, p = 0.008) and in-hospital complications (OR 1.93, 95% CI 1.32–2.83, p = 0.001). Nearly two-thirds of patients discharged home were younger (OR 0.92, 95% CI 0.85–0.99, p = 0.036), had lower ASA class (OR 0.45, 95% CI 0.27–0.74, p = 0.002) and fewer in-hospital complications (OR 0.69, 95% CI 0.53–0.90, p = 0.007). Conclusion American Society of Anesthesiologists class and in-hospital complications are perioperative predictors of mortality and disposition in the older surgical population. Understanding the predictors of poor outcome and the importance of preventing in-hospital complications in older patients will have important clinical utility in terms of preoperative counselling, improving health care and discharging patients home. PMID:26204143
Body composition in elderly people: effect of criterion estimates on predictive equations
International Nuclear Information System (INIS)
Baumgartner, R.N.; Heymsfield, S.B.; Lichtman, S.; Wang, J.; Pierson, R.N. Jr.
1991-01-01
The purposes of this study were to determine whether there are significant differences between two- and four-compartment model estimates of body composition, whether these differences are associated with aqueous and mineral fractions of the fat-free mass (FFM); and whether the differences are retained in equations for predicting body composition from anthropometry and bioelectric resistance. Body composition was estimated in 98 men and women aged 65-94 y by using a four-compartment model based on hydrodensitometry, 3 H 2 O dilution, and dual-photon absorptiometry. These estimates were significantly different from those obtained by using Siri's two-compartment model. The differences were associated significantly (P less than 0.0001) with variation in the aqueous fraction of FFM. Equations for predicting body composition from anthropometry and resistance, when calibrated against two-compartment model estimates, retained these systematic errors. Equations predicting body composition in elderly people should be calibrated against estimates from multicompartment models that consider variability in FFM composition
Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng
2013-08-01
Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
2017-10-01
ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...distribution is unlimited. AD U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER Munitions Engineering Technology Center Picatinny...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Directory of Open Access Journals (Sweden)
Paulino José García Nieto
2016-05-01
Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.
Georga, Eleni I; Protopappas, Vasilios C; Ardigò, Diego; Polyzos, Demosthenes; Fotiadis, Dimitrios I
2013-08-01
The prevention of hypoglycemic events is of paramount importance in the daily management of insulin-treated diabetes. The use of short-term prediction algorithms of the subcutaneous (s.c.) glucose concentration may contribute significantly toward this direction. The literature suggests that, although the recent glucose profile is a prominent predictor of hypoglycemia, the overall patient's context greatly impacts its accurate estimation. The objective of this study is to evaluate the performance of a support vector for regression (SVR) s.c. glucose method on hypoglycemia prediction. We extend our SVR model to predict separately the nocturnal events during sleep and the non-nocturnal (i.e., diurnal) ones over 30-min and 60-min horizons using information on recent glucose profile, meals, insulin intake, and physical activities for a hypoglycemic threshold of 70 mg/dL. We also introduce herein additional variables accounting for recurrent nocturnal hypoglycemia due to antecedent hypoglycemia, exercise, and sleep. SVR predictions are compared with those from two other machine learning techniques. The method is assessed on a dataset of 15 patients with type 1 diabetes under free-living conditions. Nocturnal hypoglycemic events are predicted with 94% sensitivity for both horizons and with time lags of 5.43 min and 4.57 min, respectively. As concerns the diurnal events, when physical activities are not considered, the sensitivity is 92% and 96% for a 30-min and 60-min horizon, respectively, with both time lags being less than 5 min. However, when such information is introduced, the diurnal sensitivity decreases by 8% and 3%, respectively. Both nocturnal and diurnal predictions show a high (>90%) precision. Results suggest that hypoglycemia prediction using SVR can be accurate and performs better in most diurnal and nocturnal cases compared with other techniques. It is advised that the problem of hypoglycemia prediction should be handled differently for nocturnal
International Nuclear Information System (INIS)
Kim, J. Y.; Shin, C. H.; Kim, J. K.; Lee, J. K.; Park, Y. J.
2003-01-01
The variation transitions of the inventories for the liquid radwaste system and the radioactive gas have being released in containment, and their predictive values according to the operation histories of Yonggwang(YGN) 3 and 4 were analyzed by linear regression analysis methodology. The results show that the variation transitions of the inventories for those systems are linearly increasing according to the operation histories but the inventories released to the environment are considerably lower than the recommended values based on the FSAR suggestions. It is considered that some conservation were presented in the estimation methodology in preparing stage of FSAR
Directory of Open Access Journals (Sweden)
Anne-Laure Boulesteix
2017-01-01
Full Text Available As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper, such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.
Directory of Open Access Journals (Sweden)
Pascal Schopp
2017-11-01
Full Text Available A major application of genomic prediction (GP in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs, experimental studies found substantial variation in prediction accuracy (PA, but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize (Zea mays L. as parents to generate in silico 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP models trained with individuals from either full-sib (FSF, half-sib (HSF, or unrelated families (URF for various sizes (Ntrain of the training set and different heritabilities (h2 . In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41–0.97 with large variation only for Ntrain < 50 and h2 < 0.6. For HSF and URF, PA was on average ∼40–60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.
De Vore, Karl W; Fatahi, Nadia M; Sass, John E
2016-08-01
Arrhenius modeling of analyte recovery at increased temperatures to predict long-term colder storage stability of biological raw materials, reagents, calibrators, and controls is standard practice in the diagnostics industry. Predicting subzero temperature stability using the same practice is frequently criticized but nevertheless heavily relied upon. We compared the ability to predict analyte recovery during frozen storage using 3 separate strategies: traditional accelerated studies with Arrhenius modeling, and extrapolation of recovery at 20% of shelf life using either ordinary least squares or a radical equation y = B1x(0.5) + B0. Computer simulations were performed to establish equivalence of statistical power to discern the expected changes during frozen storage or accelerated stress. This was followed by actual predictive and follow-up confirmatory testing of 12 chemistry and immunoassay analytes. Linear extrapolations tended to be the most conservative in the predicted percent recovery, reducing customer and patient risk. However, the majority of analytes followed a rate of change that slowed over time, which was fit best to a radical equation of the form y = B1x(0.5) + B0. Other evidence strongly suggested that the slowing of the rate was not due to higher-order kinetics, but to changes in the matrix during storage. Predicting shelf life of frozen products through extrapolation of early initial real-time storage analyte recovery should be considered the most accurate method. Although in this study the time required for a prediction was longer than a typical accelerated testing protocol, there are less potential sources of error, reduced costs, and a lower expenditure of resources. © 2016 American Association for Clinical Chemistry.
International Nuclear Information System (INIS)
Pinder, John E.; Rowan, David J.; Smith, Jim T.
2016-01-01
Data from published studies and World Wide Web sources were combined to develop a regression model to predict "1"3"7Cs concentration ratios for saltwater fish. Predictions were developed from 1) numeric trophic levels computed primarily from random resampling of known food items and 2) K concentrations in the saltwater for 65 samplings from 41 different species from both the Atlantic and Pacific Oceans. A number of different models were initially developed and evaluated for accuracy which was assessed as the ratios of independently measured concentration ratios to those predicted by the model. In contrast to freshwater systems, were K concentrations are highly variable and are an important factor in affecting fish concentration ratios, the less variable K concentrations in saltwater were relatively unimportant in affecting concentration ratios. As a result, the simplest model, which used only trophic level as a predictor, had comparable accuracies to more complex models that also included K concentrations. A test of model accuracy involving comparisons of 56 published concentration ratios from 51 species of marine fish to those predicted by the model indicated that 52 of the predicted concentration ratios were within a factor of 2 of the observed concentration ratios. - Highlights: • We developed a model to predict concentration ratios (C_r) for saltwater fish. • The model requires only a single input variable to predict C_r. • That variable is a mean numeric trophic level available at (fishbase.org). • The K concentrations in seawater were not an important predictor variable. • The median-to observed ratio for 56 independently measured C_r was 0.83.
International Nuclear Information System (INIS)
Bukhari, W; Hong, S-M
2015-01-01
Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR + , implements a gating function without pre-specifying a particular region of the patient’s breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR + algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR + implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR + in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR + . The experimental results show that the EKF-GPR + algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR + reduces the patient-wise RMS error to 37%, 39% and 42
Bukhari, W.; Hong, S.-M.
2015-01-01
Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR+, implements a gating function without pre-specifying a particular region of the patient’s breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR+ algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR+ implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR+ in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR+. The experimental results show that the EKF-GPR+ algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR+ reduces the patient-wise RMS error to 37%, 39% and 42% in
Bukhari, W; Hong, S-M
2015-01-07
Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR(+), implements a gating function without pre-specifying a particular region of the patient's breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR(+) algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR(+) implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR(+) in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR(+). The experimental results show that the EKF-GPR(+) algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR(+) reduces the patient-wise RMS error to 37%, 39% and
Paul, Suman; Ali, Muhammad; Chatterjee, Rima
2018-01-01
Velocity of compressional wave ( V P) of coal and non-coal lithology is predicted from five wells from the Bokaro coalfield (CF), India. Shear sonic travel time logs are not recorded for all wells under the study area. Shear wave velocity ( Vs) is available only for two wells: one from east and other from west Bokaro CF. The major lithologies of this CF are dominated by coal, shaly coal of Barakar formation. This paper focuses on the (a) relationship between Vp and Vs, (b) prediction of Vp using regression and neural network modeling and (c) estimation of maximum horizontal stress from image log. Coal characterizes with low acoustic impedance (AI) as compared to the overlying and underlying strata. The cross-plot between AI and Vp/ Vs is able to identify coal, shaly coal, shale and sandstone from wells in Bokaro CF. The relationship between Vp and Vs is obtained with excellent goodness of fit ( R 2) ranging from 0.90 to 0.93. Linear multiple regression and multi-layered feed-forward neural network (MLFN) models are developed for prediction Vp from two wells using four input log parameters: gamma ray, resistivity, bulk density and neutron porosity. Regression model predicted Vp shows poor fit (from R 2 = 0.28) to good fit ( R 2 = 0.79) with the observed velocity. MLFN model predicted Vp indicates satisfactory to good R2 values varying from 0.62 to 0.92 with the observed velocity. Maximum horizontal stress orientation from a well at west Bokaro CF is studied from Formation Micro-Imager (FMI) log. Breakouts and drilling-induced fractures (DIFs) are identified from the FMI log. Breakout length of 4.5 m is oriented towards N60°W whereas the orientation of DIFs for a cumulative length of 26.5 m is varying from N15°E to N35°E. The mean maximum horizontal stress in this CF is towards N28°E.
Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar
2008-01-01
The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
A temperature rise equation for predicting environmental impact and performance of cooling ponds
Energy Technology Data Exchange (ETDEWEB)
Serag-Eldin, M.A. [American Univ. in Cairo, Cairo (Egypt). Dept. of Mechanical Engineering
2009-07-01
Cooling ponds are used to cool the condenser water used in large central air-conditioning systems. However, larger cooling loads can often increase pond surface evaporation rates. A temperature-rise energy equation was developed to predict temperature rises in cooling ponds subjected to heating loads. The equation was designed to reduce the need for detailed meteorological data as well as to determine the required surface area and depth of the pond for any given design criteria. Energy equations in the presence and absence of cooling loads were subtracted from each other to determine increases in pond temperature resulting from the cooling load. The energy equations include solar radiation, radiation exchange with sky and surroundings, heat convection from the surface, evaporative cooling, heat conducted to the walls, and rate of change of water temperature. Results of the study suggested that the environmental impact and performance of the cooling pond is a function of temperature only. It was concluded that with the aid of the calculated flow field and temperature distribution, the method can be used to position sprays in order to produce near-uniform pond temperatures. 10 refs., 12 figs.
Are the general equations to predict BMR applicable to patients with anorexia nervosa?
Marra, M; Polito, A; De Filippo, E; Cuzzolaro, M; Ciarapica, D; Contaldo, F; Scalfi, L
2002-03-01
To determine whether the general equations to predict basal metabolic rate (BMR) can be reliably applied to female anorectics. Two hundred and thirty-seven female patients with anorexia nervosa (AN) were divided into an adolescent group [n=43, 13-17 yrs, 39.3+/-5.0 kg, body mass index (BMI) (weight/height) 15.5+/-1.8 kg/m2] and a young-adult group (n=194, 18-40 yrs, 40.5+/-6.1 kg, BMI 15.6+/-1.9 kg/m2). BMR values determined by indirect calorimetry were compared with those predicted according to either the WHO/FAO/UNU or the Harris-Benedict general equations, or using the Schebendach correction formula (proposed for adjusting the Harris-Benedict estimates in anorectics). Measured BMR was 3,658+/-665 kJ/day in the adolescent and 3,907+/-760 kJ/day in the young-adult patients. In the adolescent group, the differences between predicted and measured values were (mean+/-SD) 1,466 529 kJ/day (+44+/-21%) for WHO/FAO/UNU, 1,587+/-552 kJ/day (+47+/-23%) for the Harris-Benedict and -20+/-510 kJ/day for the Schebendach (+1+/-13%), while in the young-adult group the corresponding values were 696+/-570 kJ/day (+24+/-24%), 1,252+/-644 kJ/day (+37+/-27%) and -430+/-640 kJ/day (-9+/-16%). The bias was negatively associated with weight and BMI in both groups when using the WHO/FAO/UNU and Harris-Benedict equations, and with age in the young-adult group for the Harris-Benedict and Schebendach equations. The WHO/FAO/UNU and Harris-Benedict equations greatly overestimate BMR in AN. Accurate estimation is to some extent dependent on individual characteristics such as age, weight or BMI. The Schebendach correction formula accurately predicts BMR in female adolescents, but not in young adult women with AN.
Energy Technology Data Exchange (ETDEWEB)
Biyanto, Totok R. [Department of Engineering Physics, Institute Technology of Sepuluh Nopember Surabaya, Surabaya, Indonesia 60111 (Indonesia)
2016-06-03
Fouling in a heat exchanger in Crude Preheat Train (CPT) refinery is an unsolved problem that reduces the plant efficiency, increases fuel consumption and CO{sub 2} emission. The fouling resistance behavior is very complex. It is difficult to develop a model using first principle equation to predict the fouling resistance due to different operating conditions and different crude blends. In this paper, Artificial Neural Networks (ANN) MultiLayer Perceptron (MLP) with input structure using Nonlinear Auto-Regressive with eXogenous (NARX) is utilized to build the fouling resistance model in shell and tube heat exchanger (STHX). The input data of the model are flow rates and temperatures of the streams of the heat exchanger, physical properties of product and crude blend data. This model serves as a predicting tool to optimize operating conditions and preventive maintenance of STHX. The results show that the model can capture the complexity of fouling characteristics in heat exchanger due to thermodynamic conditions and variations in crude oil properties (blends). It was found that the Root Mean Square Error (RMSE) are suitable to capture the nonlinearity and complexity of the STHX fouling resistance during phases of training and validation.
Energy Technology Data Exchange (ETDEWEB)
Jorjani, E.; Poorali, H.A.; Sam, A.; Chelgani, S.C.; Mesroghli, S.; Shayestehfar, M.R. [Islam Azad University, Tehran (Iran). Dept. of Mining Engineering
2009-10-15
In this paper, the combustible value (i.e. 100-Ash) and combustible recovery of coal flotation concentrate were predicted by regression and artificial neural network based on proximate and group macerals analysis. The regression method shows that the relationships between (a) in (ash), volatile matter and moisture (b) in (ash), in (liptinite), fusinite and vitrinite with combustible value can achieve the correlation coefficients (R{sup 2}) of 0.8 and 0.79, respectively. In addition, the input sets of (c) ash, volatile matter and moisture (d) ash, liptinite and fusinite can predict the combustible recovery with the correlation coefficients of 0.84 and 0.63, respectively. Feed-forward artificial neural network with 6-8-12-11-2-1 arrangement for moisture, ash and volatile matter input set was capable to estimate both combustible value and combustible recovery with correlation of 0.95. It was shown that the proposed neural network model could accurately reproduce all the effects of proximate and group macerals analysis on coal flotation system.
International Nuclear Information System (INIS)
Comesanna Garcia, Yumirka; Dago Morales, Angel; Talavera Bustamante, Isneri
2010-01-01
The recently introduction of the least squares support vector machines method for regression purposes in the field of Chemometrics has provided several advantages to linear and nonlinear multivariate calibration methods. The objective of the paper was to propose the use of the least squares support vector machine as an alternative multivariate calibration method for the prediction of the percentage of crystallinity of fluidized catalytic cracking catalysts, by means of Fourier transform mid-infrared spectroscopy. A linear kernel was used in the calculations of the regression model. The optimization of its gamma parameter was carried out using the leave-one-out cross-validation procedure. The root mean square error of prediction was used to measure the performance of the model. The accuracy of the results obtained with the application of the method is in accordance with the uncertainty of the X-ray powder diffraction reference method. To compare the generalization capability of the developed method, a comparison study was carried out, taking into account the results achieved with the new model and those reached through the application of linear calibration methods. The developed method can be easily implemented in refinery laboratories
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Directory of Open Access Journals (Sweden)
Shokri Saeid
2015-01-01
Full Text Available An accurate prediction of sulfur content is very important for the proper operation and product quality control in hydrodesulfurization (HDS process. For this purpose, a reliable data- driven soft sensors utilizing Support Vector Regression (SVR was developed and the effects of integrating Vector Quantization (VQ with Principle Component Analysis (PCA were studied on the assessment of this soft sensor. First, in pre-processing step the PCA and VQ techniques were used to reduce dimensions of the original input datasets. Then, the compressed datasets were used as input variables for the SVR model. Experimental data from the HDS setup were employed to validate the proposed integrated model. The integration of VQ/PCA techniques with SVR model was able to increase the prediction accuracy of SVR. The obtained results show that integrated technique (VQ-SVR was better than (PCA-SVR in prediction accuracy. Also, VQ decreased the sum of the training and test time of SVR model in comparison with PCA. For further evaluation, the performance of VQ-SVR model was also compared to that of SVR. The obtained results indicated that VQ-SVR model delivered the best satisfactory predicting performance (AARE= 0.0668 and R2= 0.995 in comparison with investigated models.
International Nuclear Information System (INIS)
Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina
2009-01-01
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
Directory of Open Access Journals (Sweden)
Yuehjen E. Shao
2013-01-01
Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.
Verachtert, E.; Den Eeckhaut, M. Van; Poesen, J.; Govers, G.; Deckers, J.
2011-07-01
Soil piping (tunnel erosion) has been recognised as an important erosion process in collapsible loess-derived soils of temperate humid climates, which can cause collapse of the topsoil and formation of discontinuous gullies. Information about the spatial patterns of collapsed pipes and regional models describing these patterns is still limited. Therefore, this study aims at better understanding the factors controlling the spatial distribution and predicting pipe collapse. A dataset with parcels suffering from collapsed pipes (n = 560) and parcels without collapsed pipes was obtained through a regional survey in a 236 km² study area in the Flemish Ardennes (Belgium). Logistic regression was applied to find the best model describing the relationship between the presence/absence of a collapsed pipe and a set of independent explanatory variables (i.e. slope gradient, drainage area, distance-to-thalweg, curvature, aspect, soil type and lithology). Special attention was paid to the selection procedure of the grid cells without collapsed pipes. Apart from the first piping susceptibility map created by logistic regression modelling, a second map was made based on topographical thresholds of slope gradient and upslope drainage area. The logistic regression model allowed identification of the most important factors controlling pipe collapse. Pipes are much more likely to occur when a topographical threshold depending on both slope gradient and upslope area is exceeded in zones with a sufficient water supply (due to topographical convergence and/or the presence of a clay-rich lithology). On the other hand, the use of slope-area thresholds only results in reasonable predictions of piping susceptibility, with minimum information.
Sjølie, A K; Klein, R; Porta, M; Orchard, T; Fuller, J; Parving, H H; Bilous, R; Aldington, S; Chaturvedi, N
2011-03-01
To study the association between baseline retinal microaneurysm score and progression and regression of diabetic retinopathy, and response to treatment with candesartan in people with diabetes. This was a multicenter randomized clinical trial. The progression analysis included 893 patients with Type 1 diabetes and 526 patients with Type 2 diabetes with retinal microaneurysms only at baseline. For regression, 438 with Type 1 and 216 with Type 2 diabetes qualified. Microaneurysms were scored from yearly retinal photographs according to the Early Treatment Diabetic Retinopathy Study (ETDRS) protocol. Retinopathy progression and regression was defined as two or more step change on the ETDRS scale from baseline. Patients were normoalbuminuric, and normotensive with Type 1 and Type 2 diabetes or treated hypertensive with Type 2 diabetes. They were randomized to treatment with candesartan 32 mg daily or placebo and followed for 4.6 years. A higher microaneurysm score at baseline predicted an increased risk of retinopathy progression (HR per microaneurysm score 1.08, P diabetes; HR 1.07, P = 0.0174 in Type 2 diabetes) and reduced the likelihood of regression (HR 0.79, P diabetes; HR 0.85, P = 0.0009 in Type 2 diabetes), all adjusted for baseline variables and treatment. Candesartan reduced the risk of microaneurysm score progression. Microaneurysm counts are important prognostic indicators for worsening of retinopathy, thus microaneurysms are not benign. Treatment with renin-angiotensin system inhibitors is effective in the early stages and may improve mild diabetic retinopathy. Microaneurysm scores may be useful surrogate endpoints in clinical trials. © 2011 The Authors. Diabetic Medicine © 2011 Diabetes UK.
Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo
2018-05-10
Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.
Fathallah, F A; Marras, W S; Parnianpour, M
1999-09-01
Most biomechanical assessments of spinal loading during industrial work have focused on estimating peak spinal compressive forces under static and sagittally symmetric conditions. The main objective of this study was to explore the potential of feasibly predicting three-dimensional (3D) spinal loading in industry from various combinations of trunk kinematics, kinetics, and subject-load characteristics. The study used spinal loading, predicted by a validated electromyography-assisted model, from 11 male participants who performed a series of symmetric and asymmetric lifts. Three classes of models were developed: (a) models using workplace, subject, and trunk motion parameters as independent variables (kinematic models); (b) models using workplace, subject, and measured moments variables (kinetic models); and (c) models incorporating workplace, subject, trunk motion, and measured moments variables (combined models). The results showed that peak 3D spinal loading during symmetric and asymmetric lifting were predicted equally well using all three types of regression models. Continuous 3D loading was predicted best using the combined models. When the use of such models is infeasible, the kinematic models can provide adequate predictions. Finally, lateral shear forces (peak and continuous) were consistently underestimated using all three types of models. The study demonstrated the feasibility of predicting 3D loads on the spine under specific symmetric and asymmetric lifting tasks without the need for collecting EMG information. However, further validation and development of the models should be conducted to assess and extend their applicability to lifting conditions other than those presented in this study. Actual or potential applications of this research include exposure assessment in epidemiological studies, ergonomic intervention, and laboratory task assessment.
Kaklamanos, James; Baise, Laurie G.; Boore, David M.
2011-01-01
The ground-motion prediction equations (GMPEs) developed as part of the Next Generation Attenuation of Ground Motions (NGA-West) project in 2008 are becoming widely used in seismic hazard analyses. However, these new models are considerably more complicated than previous GMPEs, and they require several more input parameters. When employing the NGA models, users routinely face situations in which some of the required input parameters are unknown. In this paper, we present a framework for estimating the unknown source, path, and site parameters when implementing the NGA models in engineering practice, and we derive geometrically-based equations relating the three distance measures found in the NGA models. Our intent is for the content of this paper not only to make the NGA models more accessible, but also to help with the implementation of other present or future GMPEs.
Dolecheck, K A; Heersche, G; Bewley, J M
2016-12-01
Assessing the economic implications of investing in automated estrus detection (AED) technologies can be overwhelming for dairy producers. The objectives of this study were to develop new regression equations for estimating the cost per day open (DO) and to apply the results to create a user-friendly, partial budget, decision support tool for investment analysis of AED technologies. In the resulting decision support tool, the end user can adjust herd-specific inputs regarding general management, current reproductive management strategies, and the proposed AED system. Outputs include expected DO, reproductive cull rate, net present value, and payback period for the proposed AED system. Utility of the decision support tool was demonstrated with an example dairy herd created using data from DairyMetrics (Dairy Records Management Systems, Raleigh, NC), Food and Agricultural Policy Research Institute (Columbia, MO), and published literature. Resulting herd size, rolling herd average milk production, milk price, and feed cost were 323 cows, 10,758kg, $0.41/kg, and $0.20/kg of dry matter, respectively. Automated estrus detection technologies with 2 levels of initial system cost (low: $5,000 vs. high: $10,000), tag price (low: $50 vs. high: $100), and estrus detection rate (low: 60% vs. high: 80%) were compared over a 7-yr investment period. Four scenarios were considered in a demonstration of the investment analysis tool: (1) a herd using 100% visual observation for estrus detection before adopting 100% AED, (2) a herd using 100% visual observation before adopting 75% AED and 25% visual observation, (3) a herd using 100% timed artificial insemination (TAI) before adopting 100% AED, and (4) a herd using 100% TAI before adopting 75% AED and 25% TAI. Net present value in scenarios 1 and 2 was always positive, indicating a positive investment situation. Net present value in scenarios 3 and 4 was always positive in combinations using a $50 tag price, and in scenario 4, the $5
Performance prediction of gas turbines by solving a system of non-linear equations
Energy Technology Data Exchange (ETDEWEB)
Kaikko, J
1998-09-01
This study presents a novel method for implementing the performance prediction of gas turbines from the component models. It is based on solving the non-linear set of equations that corresponds to the process equations, and the mass and energy balances for the engine. General models have been presented for determining the steady state operation of single components. Single and multiple shad arrangements have been examined with consideration also being given to heat regeneration and intercooling. Emphasis has been placed upon axial gas turbines of an industrial scale. Applying the models requires no information of the structural dimensions of the gas turbines. On comparison with the commonly applied component matching procedures, this method incorporates several advantages. The application of the models for providing results is facilitated as less attention needs to be paid to calculation sequences and routines. Solving the set of equations is based on zeroing co-ordinate functions that are directly derived from the modelling equations. Therefore, controlling the accuracy of the results is easy. This method gives more freedom for the selection of the modelling parameters since, unlike for the matching procedures, exchanging these criteria does not itself affect the algorithms. Implicit relationships between the variables are of no significance, thus increasing the freedom for the modelling equations as well. The mathematical models developed in this thesis will provide facilities to optimise the operation of any major gas turbine configuration with respect to the desired process parameters. The computational methods used in this study may also be adapted to any other modelling problems arising in industry. (orig.) 36 refs.
Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal
2018-06-01
Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.
Nowell, Lisa H.; Crawford, Charles G.; Gilliom, Robert J.; Nakagaki, Naomi; Stone, Wesley W.; Thelin, Gail; Wolock, David M.
2009-01-01
Empirical regression models were developed for estimating concentrations of dieldrin, total chlordane, and total DDT in whole fish from U.S. streams. Models were based on pesticide concentrations measured in whole fish at 648 stream sites nationwide (1992-2001) as part of the U.S. Geological Survey's National Water Quality Assessment Program. Explanatory variables included fish lipid content, estimates (or surrogates) representing historical agricultural and urban sources, watershed characteristics, and geographic location. Models were developed using Tobit regression methods appropriate for data with censoring. Typically, the models explain approximately 50 to 70% of the variability in pesticide concentrations measured in whole fish. The models were used to predict pesticide concentrations in whole fish for streams nationwide using the U.S. Environmental Protection Agency's River Reach File 1 and to estimate the probability that whole-fish concentrations exceed benchmarks for protection of fish-eating wildlife. Predicted concentrations were highest for dieldrin in the Corn Belt, Texas, and scattered urban areas; for total chlordane in the Corn Belt, Texas, the Southeast, and urbanized Northeast; and for total DDT in the Southeast, Texas, California, and urban areas nationwide. The probability of exceeding wildlife benchmarks for dieldrin and chlordane was predicted to be low for most U.S. streams. The probability of exceeding wildlife benchmarks for total DDT is higher but varies depending on the fish taxon and on the benchmark used. Because the models in the present study are based on fish data collected during the 1990s and organochlorine pesticide residues in the environment continue to decline decades after their uses were discontinued, these models may overestimate present-day pesticide concentrations in fish. ?? 2009 SETAC.
Bejaei, M; Wiseman, K; Cheng, K M
2015-01-01
Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types.
Fast integration-based prediction bands for ordinary differential equation models.
Hass, Helge; Kreutz, Clemens; Timmer, Jens; Kaschek, Daniel
2016-04-15
To gain a deeper understanding of biological processes and their relevance in disease, mathematical models are built upon experimental data. Uncertainty in the data leads to uncertainties of the model's parameters and in turn to uncertainties of predictions. Mechanistic dynamic models of biochemical networks are frequently based on nonlinear differential equation systems and feature a large number of parameters, sparse observations of the model components and lack of information in the available data. Due to the curse of dimensionality, classical and sampling approaches propagating parameter uncertainties to predictions are hardly feasible and insufficient. However, for experimental design and to discriminate between competing models, prediction and confidence bands are essential. To circumvent the hurdles of the former methods, an approach to calculate a profile likelihood on arbitrary observations for a specific time point has been introduced, which provides accurate confidence and prediction intervals for nonlinear models and is computationally feasible for high-dimensional models. In this article, reliable and smooth point-wise prediction and confidence bands to assess the model's uncertainty on the whole time-course are achieved via explicit integration with elaborate correction mechanisms. The corresponding system of ordinary differential equations is derived and tested on three established models for cellular signalling. An efficiency analysis is performed to illustrate the computational benefit compared with repeated profile likelihood calculations at multiple time points. The integration framework and the examples used in this article are provided with the software package Data2Dynamics, which is based on MATLAB and freely available at http://www.data2dynamics.org helge.hass@fdm.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Hippisley-Cox, Julia; Coupland, Carol
2017-01-01
Objective: To develop and externally validate risk prediction equations to estimate absolute and conditional survival in patients with colorectal cancer. \\ud \\ud Design: Cohort study.\\ud \\ud Setting: General practices in England providing data for the QResearch database linked to the national cancer registry.\\ud \\ud Participants: 44 145 patients aged 15-99 with colorectal cancer from 947 practices to derive the equations. The equations were validated in 15 214 patients with colorectal cancer ...
Directory of Open Access Journals (Sweden)
Sadegh Maleki
2013-11-01
Full Text Available The goal of this study was to present regression models for predicting resistance of joints made with screw and plywood members. Joint members were out of hardwood plywood that were 19 mm in thickness. Two types of screws including coarse and fine thread drywall screw with 3.5, 4 and 5mm in diameter and sheet metal screw with 4 and 5mm were used. Results have shown that withdrawal resistance of screw was increased by increasing of screws, diameter and penetrating depth. Joints fabricated with coarse thread drywall screws were higher than those of fine thread drywall screws. Finally, average joint withdrawal resistance of screwed could be predicted by means of the expressions Wc=2.127×D1.072×P0.520 for coarse thread drywall screws and Wf=1.377×D1.156×P0.581 for fine thread drywall screws by taking account the diameter and penetrating depth. The difference of the observed and predicted data showed that developed models have a good correlation with actual experimental measurements.
Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y
2008-02-18
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Directory of Open Access Journals (Sweden)
Isabela M. B. S. Pessoa
2014-10-01
Full Text Available Background: The maximum static respiratory pressures, namely the maximum inspiratory pressure (MIP and maximum expiratory pressure (MEP, reflect the strength of the respiratory muscles. These measures are simple, non-invasive, and have established diagnostic and prognostic value. This study is the first to examine the maximum respiratory pressures within the Brazilian population according to the recommendations proposed by the American Thoracic Society and European Respiratory Society (ATS/ERS and the Brazilian Thoracic Association (SBPT. Objective: To establish reference equations, mean values, and lower limits of normality for MIP and MEP for each age group and sex, as recommended by the ATS/ERS and SBPT. Method: We recruited 134 Brazilians living in Belo Horizonte, MG, Brazil, aged 20-89 years, with a normal pulmonary function test and a body mass index within the normal range. We used a digital manometer that operationalized the variable maximum average pressure (MIP/MEP. At least five tests were performed for both MIP and MEP to take into account a possible learning effect. Results: We evaluated 74 women and 60 men. The equations were as follows: MIP=63.27-0.55 (age+17.96 (gender+0.58 (weight, r2 of 34% and MEP= - 61.41+2.29 (age - 0.03(age2+33.72 (gender+1.40 (waist, r2 of 49%. Conclusion: In clinical practice, these equations could be used to calculate the predicted values of MIP and MEP for the Brazilian population.
Energy Technology Data Exchange (ETDEWEB)
Brear, D.J. [Power Reactor and Nuclear Fuel Development Corp., Oarai, Ibaraki (Japan). Oarai Engineering Center
1998-01-01
When liquid fuel makes contact with steel structure the liquid can freeze as a crust and the structure can melt at the surface. The melting and freezing processes that occur can influence the mode of fuel freezing and hence fuel relocation. Furthermore the temperature gradients established in the fuel and steel phases determine the rate at which heat is transferred from fuel to steel. In this memo the 1-D transient heat conduction equations are applied to the case of initially liquid UO{sub 2} brought into contact with solid steel using up-to-date materials properties. The solutions predict criteria for fuel crust formation and steel melting and provide a simple algorithm to determine the interface temperature when one or both of the materials is undergoing phase change. The predicted steel melting criterion is compared with available experimental results. (author)
International Nuclear Information System (INIS)
Brear, D.J.
1998-01-01
When liquid fuel makes contact with steel structure the liquid can freeze as a crust and the structure can melt at the surface. The melting and freezing processes that occur can influence the mode of fuel freezing and hence fuel relocation. Furthermore the temperature gradients established in the fuel and steel phases determine the rate at which heat is transferred from fuel to steel. In this memo the 1-D transient heat conduction equations are applied to the case of initially liquid UO 2 brought into contact with solid steel using up-to-date materials properties. The solutions predict criteria for fuel crust formation and steel melting and provide a simple algorithm to determine the interface temperature when one or both of the materials is undergoing phase change. The predicted steel melting criterion is compared with available experimental results. (author)
Directory of Open Access Journals (Sweden)
Glauco H.S. Mendes
2013-09-01
Full Text Available Critical success factors in new product development (NPD in the Brazilian small and medium enterprises (SMEs are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.
Directory of Open Access Journals (Sweden)
Adel T. Abbas
2017-01-01
Full Text Available The Grade-H high strength steel is used in the manufacturing of many civilian and military products. The procedures of manufacturing these parts have several turning operations. The key factors for the manufacturing of these parts are the accuracy, surface roughness (Ra, and material removal rate (MRR. The production line of these parts contains many CNC turning machines to get good accuracy and repeatability. The manufacturing engineer should fulfill the required surface roughness value according to the design drawing from first trail (otherwise these parts will be rejected as well as keeping his eye on maximum metal removal rate. The rejection of these parts at any processing stage will represent huge problems to any factory because the processing and raw material of these parts are very expensive. In this paper the artificial neural network was used for predicting the surface roughness for different cutting parameters in CNC turning operations. These parameters were investigated to get the minimum surface roughness. In addition, a mathematical model for surface roughness was obtained from the experimental data using a regression analysis method. The experimental data are then compared with both the regression analysis results and ANFIS (Adaptive Network-based Fuzzy Inference System estimations.
Effect of creatinine assay calibration on glomerular filtration rate prediction by MDRD equation
Directory of Open Access Journals (Sweden)
Débora Spessatto
2009-01-01
Full Text Available Background: The evaluation of renal function should be performed with glomerular filtration rate (GFR estimation employing the Modification of Diet in Renal Disease (MDRD study equation, which includes age, gender, ethnicity and serum creatinine. However, creatinine methods require traceability with standardized methods. Objective: To analyse the impact of creatinine calibration on MDRD calculated GFR. Methods: 140 samples of plasma with creatinine values <2,0 mg/dl were analysed by Jaffé’s reaction with Creatinina Modular P (Roche ®; method A; reference and Creatinina Advia 1650 (Bayer ®; method B; non-standardized. The results with the different methods were compared and aligned with standardized method through a conversion formula. MDRD GFR was estimated. Results: Values were higher for method B (1.03 ± 0.29 vs. 0.86 ± 0.32 mg/dl, P<0.001. This difference declined when methods were aligned with the equation y=1.07x -0.249, and the aligned values were 0,9 ± 0,31 mg/dl. Non-traceable creatinine methods misclassificaed chronic kidney disease in 10% more (false positive. This disagreement disappeared after the regression alignment. Conclusion: Creatinine method calibration has a large impact over the final results of serum creatinine and GFR. The alignment of the non-standardized results through conversion formulas is a reasonable alternative to harmonize serum creatinine results while waiting for the full implementation of international standardization programs.
Energy Technology Data Exchange (ETDEWEB)
Zhang, Yan-Feng; Dai, Shu-Gui [College of Environmental Science and Engineering, Nankai University, Key Laboratory for Pollution Process and Environmental Criteria of Ministry of Education, Tianjin (China); Ma, Yi [College of Chemistry, Nankai University, Institute of Elemento-Organic Chemistry, Tianjin (China); Gao, Zhi-Xian [Institute of Hygiene and Environmental Medicine, Tianjin (China)
2010-07-15
Immunoassays have been regarded as a possible alternative or supplement for measuring polycyclic aromatic hydrocarbons (PAHs) in the environment. Since there are too many potential cross-reactants for PAH immunoassays, it is difficult to determine all the cross-reactivities (CRs) by experimental tests. The relationship between CR and the physical-chemical properties of PAHs and related compounds was investigated using the CR data from a commercial enzyme-linked immunosorbent assay (ELISA) kit test. Two quantitative structure-activity relationship (QSAR) techniques, regression analysis and comparative molecular field analysis (CoMFA), were applied for predicting the CR of PAHs in this ELISA kit. Parabolic regression indicates that the CRs are significantly correlated with the logarithm of the partition coefficient for the octanol-water system (log K{sub ow}) (r{sup 2}=0.643, n=23, P<0.0001), suggesting that hydrophobic interactions play an important role in the antigen-antibody binding and the cross-reactions in this ELISA test. The CoMFA model obtained shows that the CRs of the PAHs are correlated with the 3D structure of the molecules (r{sub cv}{sup 2}=0.663, r{sup 2}=0.873, F{sub 4,32}=55.086). The contributions of the steric and electrostatic fields to CR were 40.4 and 59.6%, respectively. Both of the QSAR models satisfactorily predict the CR in this PAH immunoassay kit, and help in understanding the mechanisms of antigen-antibody interaction. (orig.)
International Nuclear Information System (INIS)
Atsumi, Kazushige; Shioyama, Yoshiyuki; Nakamura, Katsumasa
2010-01-01
The purpose of this retrospective study was to clarify the predictive factors correlated with esophageal stenosis within three months after radiation therapy for locally advanced esophageal cancer. We enrolled 47 patients with advanced esophageal cancer with T2-4 and stage II-III who were treated with definitive radiation therapy and achieving complete response of primary lesion at Kyushu University Hospital between January 1998 and December 2005. Esophagography was performed for all patients before treatment and within three months after completion of the radiation therapy, the esophageal stenotic ratio was evaluated. The stenotic ratio was used to define four levels of stenosis: stenosis level 1, stenotic ratio of 0-25%; 2, 25-50%; 3, 50-75%; 4, 75-100%. We then estimated the correlation between the esophageal stenosis level after radiation therapy and each of numerous factors. The numbers and total percentages of patients at each stenosis level were as follows: level 1: n=14 (30%); level 2: 8 (17%); level 3: 14 (30%); and level 4: 11 (23%). Esophageal stenosis in the case of full circumference involvement tended to be more severe and more frequent. Increases in wall thickness tended to be associated with increases in esophageal stenosis severity and frequency. The extent of involved circumference and wall thickness of tumor region were significantly correlated with esophageal stenosis associated with tumor regression in radiation therapy (p=0.0006, p=0.005). For predicting the possibility of esophageal stenosis with tumor regression within three months in radiation therapy, the extent of involved circumference and esophageal wall thickness of the tumor region may be useful. (author)
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (Plogistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Silva, A L; DeVries, T J; Tedeschi, L O; Marcondes, M I
2018-04-16
There is a lack of studies that provide models or equations capable of predicting starter feed intake (SFI) for milk-fed dairy calves. Therefore, a multi-study analysis was conducted to identify variables that influence SFI, and to develop equations to predict SFI in milk-fed dairy calves up to 64 days of age. The database was composed of individual data of 176 calves from eight experiments, totaling 6426 daily observations of intake. The information collected from the studies were: birth BW (kg), SFI (kg/day), fluid milk or milk replacer intake (MI; l/day), sex (male or female), breed (Holstein or Holstein×Gyr crossbred) and age (days). Correlations between SFI and the quantitative variables MI, birth BW, metabolic birth BW, fat intake, CP intake, metabolizable energy intake, and age were calculated. Subsequently, data were graphed, and based on a visual appraisal of the pattern of the data, an exponential function was chosen. Data were evaluated using a meta-analysis approach to estimate fixed and random effects of the experiments using nonlinear mixed coefficient statistical models. A negative correlation between SFI and MI was observed (r=-0.39), but age was positively correlated with SFI (r=0.66). No effect of liquid feed source (milk or milk replacer) was observed in developing the equation. Two equations, significantly different for all parameters, were fit to predict SFI for calves that consume less than 5 (SFI5) l/day of milk or milk replacer: ${\\rm SFI}_{{\\,\\lt\\,5}} {\\equals}0.1839_{{\\,\\pm\\,0.0581}} {\\times}{\\rm MI}{\\times}{\\rm exp}^{{\\left( {\\left( {0.0333_{{\\,\\pm\\,0.0021 }} {\\minus}0.0040_{{\\,\\pm\\,0.0011}} {\\times}{\\rm MI}} \\right){\\times}\\left( {{\\rm A}{\\minus}{\\rm }\\left( {0.8302_{{\\,\\pm\\,0.5092}} {\\plus}6.0332_{{\\,\\pm\\,0.3583}} {\\times}{\\rm MI}} \\right)} \\right)} \\right)}} {\\minus}\\left( {0.12{\\times}{\\rm MI}} \\right)$ ; ${\\rm SFI}_{{\\,\\gt\\,5}} {\\equals}0.1225_{{\\,\\pm\\,0.0005 }} {\\times
International Nuclear Information System (INIS)
Boutilier, J; Chan, T; Lee, T; Craig, T; Sharpe, M
2014-01-01
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time
Energy Technology Data Exchange (ETDEWEB)
Boutilier, J; Chan, T; Lee, T [University of Toronto, Toronto, Ontario (Canada); Craig, T; Sharpe, M [University of Toronto, Toronto, Ontario (Canada); The Princess Margaret Cancer Centre - UHN, Toronto, ON (Canada)
2014-06-15
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.
Directory of Open Access Journals (Sweden)
Hon-Yi Shi
Full Text Available BACKGROUND: Since most published articles comparing the performance of artificial neural network (ANN models and logistic regression (LR models for predicting hepatocellular carcinoma (HCC outcomes used only a single dataset, the essential issue of internal validity (reproducibility of the models has not been addressed. The study purposes to validate the use of ANN model for predicting in-hospital mortality in HCC surgery patients in Taiwan and to compare the predictive accuracy of ANN with that of LR model. METHODOLOGY/PRINCIPAL FINDINGS: Patients who underwent a HCC surgery during the period from 1998 to 2009 were included in the study. This study retrospectively compared 1,000 pairs of LR and ANN models based on initial clinical data for 22,926 HCC surgery patients. For each pair of ANN and LR models, the area under the receiver operating characteristic (AUROC curves, Hosmer-Lemeshow (H-L statistics and accuracy rate were calculated and compared using paired T-tests. A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and the relative importance of variables. Compared to the LR models, the ANN models had a better accuracy rate in 97.28% of cases, a better H-L statistic in 41.18% of cases, and a better AUROC curve in 84.67% of cases. Surgeon volume was the most influential (sensitive parameter affecting in-hospital mortality followed by age and lengths of stay. CONCLUSIONS/SIGNIFICANCE: In comparison with the conventional LR model, the ANN model in the study was more accurate in predicting in-hospital mortality and had higher overall performance indices. Further studies of this model may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data.
Garno, Joshua; Ouellet, Frederick; Koneru, Rahul; Balachandar, Sivaramakrishnan; Rollin, Bertrand
2017-11-01
An analytic model to describe the hydrodynamic forces on an explosively driven particle is not currently available. The Maxey-Riley-Gatignol (MRG) particle force equation generalized for compressible flows is well-studied in shock-tube applications, and captures the evolution of particle force extracted from controlled shock-tube experiments. In these experiments only the shock-particle interaction was examined, and the effects of the contact line were not investigated. In the present work, the predictive capability of this model is considered for the case where a particle is explosively ejected from a rigid barrel into ambient air. Particle trajectory information extracted from simulations is compared with experimental data. This configuration ensures that both the shock and contact produced by the detonation will influence the motion of the particle. The simulations are carried out using a finite volume, Euler-Lagrange code using the JWL equation of state to handle the explosive products. This work was supported by the U.S. Department of Energy, National Nuclear Security Administration, Advanced Simulation and Computing Program, as a Cooperative Agreement under the Predictive Science Academic Alliance Program,under Contract No. DE-NA0002378.
Morrow, Ellis A; Marcus, Andrea; Byham-Gray, Laura
2017-11-01
Practical methods for determining resting energy expenditure (REE) among individuals on maintenance hemodialysis (MHD) are needed because of the limitations of indirect calorimetry. Two disease-specific predictive energy equations (PEEs) have been developed for this metabolically complex population. The aim of this study was to compare estimated REE (eREE) by PEEs to measured REE (mREE) with a handheld indirect calorimetry device (HICD). A prospective pilot study of adults on MHD (N = 40) was conducted at 2 dialysis clinics in Houston and Texas City, Texas. mREE by an HICD was compared with eREE determined by 6 PEEs using Bland-Altman analysis with a band of acceptable agreement of ±10% of the group mean mREE. Paired t-test and the intraclass correlation coefficient were also used to compare the alternate methods of measuring REE. A priori alpha was set at P Maintenance Hemodialysis Equation-Creatinine version (MHCD-CR) was the most accurate PEE with 52.5% of values within the band of acceptable agreement, followed by the Mifflin-St. Jeor Equation and the Vilar et al. Equation at 45.0% and 42.5%, respectively. When compared with mREE by the HICD, the MHDE-CR was more accurate and precise than other PEEs evaluated; however, this must be interpreted with caution as mREE was consistently lower than eREE from all PEEs. Further research is needed to validate the MHDE-CR and other practical methods for determining REE among individuals on MHD. Copyright © 2017 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Thompson, E. David; Bowling, Bethany V.; Markle, Ross E.
2018-02-01
Studies over the last 30 years have considered various factors related to student success in introductory biology courses. While much of the available literature suggests that the best predictors of success in a college course are prior college grade point average (GPA) and class attendance, faculty often require a valuable predictor of success in those courses wherein the majority of students are in the first semester and have no previous record of college GPA or attendance. In this study, we evaluated the efficacy of the ACT Mathematics subject exam and Lawson's Classroom Test of Scientific Reasoning in predicting success in a major's introductory biology course. A logistic regression was utilized to determine the effectiveness of a combination of scientific reasoning (SR) scores and ACT math (ACT-M) scores to predict student success. In summary, we found that the model—with both SR and ACT-M as significant predictors—could be an effective predictor of student success and thus could potentially be useful in practical decision making for the course, such as directing students to support services at an early point in the semester.
Alexeeff, Stacey E; Carroll, Raymond J; Coull, Brent
2016-04-01
Spatial modeling of air pollution exposures is widespread in air pollution epidemiology research as a way to improve exposure assessment. However, there are key sources of exposure model uncertainty when air pollution is modeled, including estimation error and model misspecification. We examine the use of predicted air pollution levels in linear health effect models under a measurement error framework. For the prediction of air pollution exposures, we consider a universal Kriging framework, which may include land-use regression terms in the mean function and a spatial covariance structure for the residuals. We derive the bias induced by estimation error and by model misspecification in the exposure model, and we find that a misspecified exposure model can induce asymptotic bias in the effect estimate of air pollution on health. We propose a new spatial simulation extrapolation (SIMEX) procedure, and we demonstrate that the procedure has good performance in correcting this asymptotic bias. We illustrate spatial SIMEX in a study of air pollution and birthweight in Massachusetts. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Tang, J. L.; Cai, C. Z.; Xiao, T. T.; Huang, S. J.
2012-07-01
The electrical conductivity of solid oxide fuel cell (SOFC) cathode is one of the most important indices affecting the efficiency of SOFC. In order to improve the performance of fuel cell system, it is advantageous to have accurate model with which one can predict the electrical conductivity. In this paper, a model utilizing support vector regression (SVR) approach combined with particle swarm optimization (PSO) algorithm for its parameter optimization was established to modeling and predicting the electrical conductivity of Ba0.5Sr0.5Co0.8Fe0.2 O3-δ-xSm0.5Sr0.5CoO3-δ (BSCF-xSSC) composite cathode under two influence factors, including operating temperature (T) and SSC content (x) in BSCF-xSSC composite cathode. The leave-one-out cross validation (LOOCV) test result by SVR strongly supports that the generalization ability of SVR model is high enough. The absolute percentage error (APE) of 27 samples does not exceed 0.05%. The mean absolute percentage error (MAPE) of all 30 samples is only 0.09% and the correlation coefficient (R2) as high as 0.999. This investigation suggests that the hybrid PSO-SVR approach may be not only a promising and practical methodology to simulate the properties of fuel cell system, but also a powerful tool to be used for optimal designing or controlling the operating process of a SOFC system.
Directory of Open Access Journals (Sweden)
Shahram Mollaiy-Berneti
2018-02-01
Full Text Available Successful design of a carbon dioxide (CO2 flooding in enhanced oil recovery projects mostly depends on accurate determination of CO2-crude oil minimum miscibility pressure (MMP. Due to the high expensive and time-consuming of experimental determination of MMP, developing a fast and robust method to predict MMP is necessary. In this study, a new method based on ε-insensitive smooth support vector regression (ε-SSVR is introduced to predict MMP for both pure and impure CO2 gas injection cases. The proposed ε-SSVR is developed using dataset of reservoir temperature, crude oil composition and composition of injected CO2. To serve better understanding of the proposed, feed-forward neural network and radial basis function network applied to denoted dataset. The results show that the suggested ε-SSVR has acceptable reliability and robustness in comparison with two other models. Thus, the proposed method can be considered as an alternative way to monitor the MMP in miscible flooding process.
Mortensen, Eric; Wu, Shu; Notaro, Michael; Vavrus, Stephen; Montgomery, Rob; De Piérola, José; Sánchez, Carlos; Block, Paul
2018-01-01
Located at a complex topographic, climatic, and hydrologic crossroads, southern Peru is a semiarid region that exhibits high spatiotemporal variability in precipitation. The economic viability of the region hinges on this water, yet southern Peru is prone to water scarcity caused by seasonal meteorological drought. Meteorological droughts in this region are often triggered during El Niño episodes; however, other large-scale climate mechanisms also play a noteworthy role in controlling the region's hydrologic cycle. An extensive season-ahead precipitation prediction model is developed to help bolster the existing capacity of stakeholders to plan for and mitigate deleterious impacts of drought. In addition to existing climate indices, large-scale climatic variables, such as sea surface temperature, are investigated to identify potential drought predictors. A principal component regression framework is applied to 11 potential predictors to produce an ensemble forecast of regional January-March precipitation totals. Model hindcasts of 51 years, compared to climatology and another model conditioned solely on an El Niño-Southern Oscillation index, achieve notable skill and perform better for several metrics, including ranked probability skill score and a hit-miss statistic. The information provided by the developed model and ancillary modeling efforts, such as extending the lead time of and spatially disaggregating precipitation predictions to the local level as well as forecasting the number of wet-dry days per rainy season, may further assist regional stakeholders and policymakers in preparing for drought.
Directory of Open Access Journals (Sweden)
Jonatan R Ruiz
Full Text Available BACKGROUND: We investigated the validity of REE predictive equations before and after 12-week energy-restricted diet intervention in Spanish obese (30 kg/m(2>BMI<40 kg/m(2 women. METHODS: We measured REE (indirect calorimetry, body weight, height, and fat mass (FM and fat free mass (FFM, dual X-ray absorptiometry in 86 obese Caucasian premenopausal women aged 36.7±7.2 y, before and after (n = 78 women the intervention. We investigated the accuracy of ten REE predictive equations using weight, height, age, FFM and FM. RESULTS: At baseline, the most accurate equation was the Mifflin et al. (Am J Clin Nutr 1990; 51: 241-247 when using weight (bias:-0.2%, P = 0.982, 74% of accurate predictions. This level of accuracy was not reached after the diet intervention (24% accurate prediction. After the intervention, the lowest bias was found with the Owen et al. (Am J Clin Nutr 1986; 44: 1-19 equation when using weight (bias:-1.7%, P = 0.044, 81% accurate prediction, yet it provided 53% accurate predictions at baseline. CONCLUSIONS: There is a wide variation in the accuracy of REE predictive equations before and after weight loss in non-morbid obese women. The results acquire especial relevance in the context of the challenging weight regain phenomenon for the overweight/obese population.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
Willis, Michael; Asseburg, Christian; Nilsson, Andreas; Johnsson, Kristina; Kartman, Bernt
2017-03-01
Type 2 diabetes mellitus (T2DM) is chronic and progressive and the cost-effectiveness of new treatment interventions must be established over long time horizons. Given the limited durability of drugs, assumptions regarding downstream rescue medication can drive results. Especially for insulin, for which treatment effects and adverse events are known to depend on patient characteristics, this can be problematic for health economic evaluation involving modeling. To estimate parsimonious multivariate equations of treatment effects and hypoglycemic event risks for use in parameterizing insulin rescue therapy in model-based cost-effectiveness analysis. Clinical evidence for insulin use in T2DM was identified in PubMed and from published reviews and meta-anal