regression equations predicting: Topics by WorldWideScience.org

Sample records for regression equations predicting

Unbalanced Regressions and the Predictive Equation

DEFF Research Database (Denmark)

Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...
Unbalanced Regressions and the Predictive Equation

DEFF Research Database (Denmark)

Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...
Regression equations to predict 6-minute walk distance in Chinese adults aged 55–85 years

OpenAIRE

Shirley P.C. Ngai, PhD; Alice Y.M. Jones, PhD; Sue C. Jenkins, PhD

2014-01-01

The 6-minute walk distance (6MWD) is used as a measure of functional exercise capacity in clinical populations and research. Reference equations to predict 6MWD in different populations have been established, however, available equations for Chinese population are scarce. This study aimed to develop regression equations to predict the 6MWD for a Hong Kong Chinese population. Fifty-three healthy individuals (25 men, 28 women; mean age = 69.3 ± 6.5 years) participated in this cross-sectional st...
Who Will Win?: Predicting the Presidential Election Using Linear Regression

Science.gov (United States)

Lamb, John H.

2007-01-01

This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Prediction Equations for Spirometry for Children from Northern India.

Science.gov (United States)

Chhabra, Sunil K; Kumar, Rajeev; Mittal, Vikas

2016-09-08

To develop prediction equations for spirometry for children from northern India using current international guidelines for standardization. Re-analysis of cross-sectional data from a single school. 670 normal children (age 6-17 y; 365 boys) of northern Indian parentage. After screening for normal health, we carried out spirometry with recommended quality assurance according to current guidelines. We developed linear and nonlinear prediction equations using multiple regression analysis. We selected the final models on the basis of the highest coefficient of multiple determination (R2) and statistical validity. Spirometry parameters: FVC, FEV1, PEFR, FEF50, FEF75 and FEF25-75. The equations for the main parameters were as follows: Boys, Ln FVC = -1.687+0.016*height +0.022*age; Ln FEV1 = -1.748+0.015*height+0.031*age. Girls, Ln FVC = -9.989 +(2.018*Ln(height)) + (0.324*Ln(age)); Ln FEV1 = -10.055 +(1.990*Ln(height))+(0.358*Ln(age)). Nonlinear regression yielded substantially greater R2 values compared to linear models except for FEF50 for girls. Height and age were found to be the significant explanatory variables for all parameters on multiple regression with weight making no significant contribution. We developed prediction equations for spirometry for children from northern India. Nonlinear equations were superior to linear equations.
Predictive Temperature Equations for Three Sites at the Grand Canyon

Science.gov (United States)

McLaughlin, Katrina Marie Neitzel

Climate data collected at a number of automated weather stations were used to create a series of predictive equations spanning from December 2009 to May 2010 in order to better predict the temperatures along hiking trails within the Grand Canyon. The central focus of this project is how atmospheric variables interact and can be combined to predict the weather in the Grand Canyon at the Indian Gardens, Phantom Ranch, and Bright Angel sites. Through the use of statistical analysis software and data regression, predictive equations were determined. The predictive equations are simple or multivariable best fits that reflect the curvilinear nature of the data. With data analysis software curves resulting from the predictive equations were plotted along with the observed data. Each equation's reduced chi2 was determined to aid the visual examination of the predictive equations' ability to reproduce the observed data. From this information an equation or pair of equations was determined to be the best of the predictive equations. Although a best predictive equation for each month and season was determined for each site, future work may refine equations to result in a more accurate predictive equation.
Hierarchical regression analysis in structural Equation Modeling

NARCIS (Netherlands)

de Jong, P.F.

1999-01-01

In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Regression Equations for Birth Weight Estimation using ...

African Journals Online (AJOL)

In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...
Evaluation of peak power prediction equations in male basketball players.

Science.gov (United States)

Duncan, Michael J; Lyons, Mark; Nevill, Alan M

2008-07-01

This study compared peak power estimated using 4 commonly used regression equations with actual peak power derived from force platform data in a group of adolescent basketball players. Twenty-five elite junior male basketball players (age, 16.5 +/- 0.5 years; mass, 74.2 +/- 11.8 kg; height, 181.8 +/- 8.1 cm) volunteered to participate in the study. Actual peak power was determined using a countermovement vertical jump on a force platform. Estimated peak power was determined using countermovement jump height and body mass. All 4 prediction equations were significantly related to actual peak power (all p jump prediction equations, 12% for the Canavan and Vescovi equation, and 6% for the Sayers countermovement jump equation. In all cases peak power was underestimated.
Prediction equations of forced oscillation technique: the insidious role of collinearity.

Science.gov (United States)

Narchi, Hassib; AlBlooshi, Afaf

2018-03-27

Many studies have reported reference data for forced oscillation technique (FOT) in healthy children. The prediction equation of FOT parameters were derived from a multivariable regression model examining the effect of age, gender, weight and height on each parameter. As many of these variables are likely to be correlated, collinearity might have affected the accuracy of the model, potentially resulting in misleading, erroneous or difficult to interpret conclusions.The aim of this work was: To review all FOT publications in children since 2005 to analyze whether collinearity was considered in the construction of the published prediction equations. Then to compare these prediction equations with our own study. And to analyse, in our study, how collinearity between the explanatory variables might affect the predicted equations if it was not considered in the model. The results showed that none of the ten reviewed studies had stated whether collinearity was checked for. Half of the reports had also included in their equations variables which are physiologically correlated, such as age, weight and height. The predicted resistance varied by up to 28% amongst these studies. And in our study, multicollinearity was identified between the explanatory variables initially considered for the regression model (age, weight and height). Ignoring it would have resulted in inaccuracies in the coefficients of the equation, their signs (positive or negative), their 95% confidence intervals, their significance level and the model goodness of fit. In Conclusion with inaccurately constructed and improperly reported models, understanding the results and reproducing the models for future research might be compromised.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression

Directory of Open Access Journals (Sweden)

Li Jian

2017-01-01

Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
A regression approach for Zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)
Development of 1RM Prediction Equations for Bench Press in Moderately Trained Men.

Science.gov (United States)

Macht, Jordan W; Abel, Mark G; Mullineaux, David R; Yates, James W

2016-10-01

Macht, JW, Abel, MG, Mullineaux, DR, and Yates, JW. Development of 1RM prediction equations for bench press in moderately trained men. J Strength Cond Res 30(10): 2901-2906, 2016-There are a variety of established 1 repetition maximum (1RM) prediction equations, however, very few prediction equations use anthropometric characteristics exclusively or in part, to estimate 1RM strength. Therefore, the purpose of this study was to develop an original 1RM prediction equation for bench press using anthropometric and performance characteristics in moderately trained male subjects. Sixty male subjects (21.2 ± 2.4 years) completed a 1RM bench press and were randomly assigned a load to complete as many repetitions as possible. In addition, body composition, upper-body anthropometric characteristics, and handgrip strength were assessed. Regression analysis was used to develop a performance-based 1RM prediction equation: 1RM = 1.20 repetition weight + 2.19 repetitions to fatigue - 0.56 biacromial width (cm) + 9.6 (R = 0.99, standard error of estimate [SEE] = 3.5 kg). Regression analysis to develop a nonperformance-based 1RM prediction equation yielded: 1RM (kg) = 0.997 cross-sectional area (CSA) (cm) + 0.401 chest circumference (cm) - 0.385%fat - 0.185 arm length (cm) + 36.7 (R = 0.81, SEE = 13.0 kg). The performance prediction equations developed in this study had high validity coefficients, minimal mean bias, and small limits of agreement. The anthropometric equations had moderately high validity coefficient but larger limits of agreement. The practical applications of this study indicate that the inclusion of anthropometric characteristics and performance variables produce a valid prediction equation for 1RM strength. In addition, the CSA of the arm uses a simple nonperformance method of estimating the lifter's 1RM. This information may be used to predict the starting load for a lifter performing a 1RM prediction protocol or a 1RM testing protocol.
Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

Science.gov (United States)

Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.

2012-01-01

Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…
Predictive equations underestimate resting energy expenditure in female adolescents with phenylketonuria

Science.gov (United States)

Quirk, Meghan E.; Schmotzer, Brian J.; Schmotzer, Brian J.; Singh, Rani H.

2010-01-01

Resting energy expenditure (REE) is often used to estimate total energy needs. The Schofield equation based on weight and height has been reported to underestimate REE in female children with phenylketonuria (PKU). The objective of this observational, cross-sectional study was to evaluate the agreement of measured REE with predicted REE for female adolescents with PKU. A total of 36 females (aged 11.5-18.7 years) with PKU attending Emory University’s Metabolic Camp (June 2002 – June 2008) underwent indirect calorimetry. Measured REE was compared to six predictive equations using paired Student’s t-tests, regression-based analysis, and assessment of clinical accuracy. The differences between measured and predicted REE were modeled against clinical parameters to determine to if a relationship existed. All six selected equations significantly under predicted measured REE (P< 0.005). The Schofield equation based on weight had the greatest level of agreement, with the lowest mean prediction bias (144 kcal) and highest concordance correlation coefficient (0.626). However, the Schofield equation based on weight lacked clinical accuracy, predicting measured REE within ±10% in only 14 of 36 participants. Clinical parameters were not associated with bias for any of the equations. Predictive equations underestimated measured REE in this group of female adolescents with PKU. Currently, there is no accurate and precise alternative for indirect calorimetry in this population. PMID:20497783
A regression approach for zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections
A local equation for differential diagnosis of β-thalassemia trait and iron deficiency anemia by logistic regression analysis in Southeast Iran.

Science.gov (United States)

Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim

2014-01-01

The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Sintering equation: determination of its coefficients by experiments - using multiple regression

International Nuclear Information System (INIS)

Windelberg, D.

1999-01-01

Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)
Development of a Watershed-Scale Long-Term Hydrologic Impact Assessment Model with the Asymptotic Curve Number Regression Equation

Directory of Open Access Journals (Sweden)

Jichul Ryu

2016-04-01

Full Text Available In this study, 52 asymptotic Curve Number (CN regression equations were developed for combinations of representative land covers and hydrologic soil groups. In addition, to overcome the limitations of the original Long-term Hydrologic Impact Assessment (L-THIA model when it is applied to larger watersheds, a watershed-scale L-THIA Asymptotic CN (ACN regression equation model (watershed-scale L-THIA ACN model was developed by integrating the asymptotic CN regressions and various modules for direct runoff/baseflow/channel routing. The watershed-scale L-THIA ACN model was applied to four watersheds in South Korea to evaluate the accuracy of its streamflow prediction. The coefficient of determination (R2 and Nash–Sutcliffe Efficiency (NSE values for observed versus simulated streamflows over intervals of eight days were greater than 0.6 for all four of the watersheds. The watershed-scale L-THIA ACN model, including the asymptotic CN regression equation method, can simulate long-term streamflow sufficiently well with the ten parameters that have been added for the characterization of streamflow.
Validity of one-repetition maximum predictive equations in men with spinal cord injury.

Science.gov (United States)

Ribeiro Neto, F; Guanais, P; Dornelas, E; Coutinho, A C B; Costa, R R G

2017-10-01

Cross-sectional study. The study aimed (a) to test the cross-validation of current one-repetition maximum (1RM) predictive equations in men with spinal cord injury (SCI); (b) to compare the current 1RM predictive equations to a newly developed equation based on the 4- to 12-repetition maximum test (4-12RM). SARAH Rehabilitation Hospital Network, Brasilia, Brazil. Forty-five men aged 28.0 years with SCI between C6 and L2 causing complete motor impairment were enrolled in the study. Volunteers were tested, in a random order, in 1RM test or 4-12RM with 2-3 interval days. Multiple regression analysis was used to generate an equation for predicting 1RM. There were no significant differences between 1RM test and the current predictive equations. ICC values were significant and were classified as excellent for all current predictive equations. The predictive equation of Lombardi presented the best Bland-Altman results (0.5 kg and 12.8 kg for mean difference and interval range around the differences, respectively). The two created equation models for 1RM demonstrated the same and a high adjusted R 2 (0.971, Ppredictive equations are accurate to assess individuals with SCI at the bench press exercise. However, the predictive equation of Lombardi presented the best associated cross-validity results. A specific 1RM prediction equation was also elaborated for individuals with SCI. The created equation should be tested in order to verify whether it presents better accuracy than the current ones.

Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression.

Science.gov (United States)

Ding, A Adam; Wu, Hulin

2014-10-01

We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.
Are predictive equations for estimating resting energy expenditure accurate in Asian Indian male weightlifters?

Directory of Open Access Journals (Sweden)

Mini Joseph

2017-01-01

Full Text Available Background: The accuracy of existing predictive equations to determine the resting energy expenditure (REE of professional weightlifters remains scarcely studied. Our study aimed at assessing the REE of male Asian Indian weightlifters with indirect calorimetry and to compare the measured REE (mREE with published equations. A new equation using potential anthropometric variables to predict REE was also evaluated. Materials and Methods: REE was measured on 30 male professional weightlifters aged between 17 and 28 years using indirect calorimetry and compared with the eight formulas predicted by Harris–Benedicts, Mifflin-St. Jeor, FAO/WHO/UNU, ICMR, Cunninghams, Owen, Katch-McArdle, and Nelson. Pearson correlation coefficient, intraclass correlation coefficient, and multiple linear regression analysis were carried out to study the agreement between the different methods, association with anthropometric variables, and to formulate a new prediction equation for this population. Results: Pearson correlation coefficients between mREE and the anthropometric variables showed positive significance with suprailiac skinfold thickness, lean body mass (LBM, waist circumference, hip circumference, bone mineral mass, and body mass. All eight predictive equations underestimated the REE of the weightlifters when compared with the mREE. The highest mean difference was 636 kcal/day (Owen, 1986 and the lowest difference was 375 kcal/day (Cunninghams, 1980. Multiple linear regression done stepwise showed that LBM was the only significant determinant of REE in this group of sportspersons. A new equation using LBM as the independent variable for calculating REE was computed. REE for weightlifters = −164.065 + 0.039 (LBM (confidence interval −1122.984, 794.854]. This new equation reduced the mean difference with mREE by 2.36 + 369.15 kcal/day (standard error = 67.40. Conclusion: The significant finding of this study was that all the prediction equations
Equations for predicting biomass of six introduced tree species, island of Hawaii

Science.gov (United States)

Thomas H. Schukrt; Robert F. Strand; Thomas G. Cole; Katharine E. McDuffie

1988-01-01

Regression equations to predict total and stem-only above-ground dry biomass for six species (Acacia melanoxylon, Albizio falcataria, Eucalyptus globulus, E. grandis, E. robusta, and E. urophylla) were developed by felling and measuring 2- to 6-year-old...
Poisson Mixture Regression Models for Heart Disease Prediction.

Science.gov (United States)

Mufudza, Chipo; Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction

Science.gov (United States)

Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Adjustment of equations to predict the metabolizable energy of corn for meat type quails

Directory of Open Access Journals (Sweden)

Tiago Junior Pasquetti

2015-08-01

Full Text Available The metabolizable energy (ME determination for foods used in quail diets, through metabolism assays, takes time, infrastructure and financial resources, which makes the development of prediction equations based on proximal composition of foods to estimate the ME values of particular interest. The objective of this study was to adjust the prediction equations of metabolizable energy (ME of corn for quail. The chemical compositions of 12 maize varieties were determined and a metabolism assay was carried out in order to determine the apparent metabolizable energy (AME and nitrogen-corrected apparent metabolizable energy (AMEn of these corn varieties. The values of chemical composition, AME and AMEn, converted to dry matter, were used to adjust the prediction equations. The initial adjustment of simple and multiple linear regression of the AME and AMEn was performed using the values of crude protein (CP, ether extract (EE, neutral (NDF and acid (ADF detergent fiber, mineral matter (MM, calcium (Ca and phosphorus (P as regressors (full model. To adjust the prediction equations the statistical procedure of simple and multiple linear regression was used, with the technique of indirect elimination (Backward. There was adjustment of 10 prediction equations, in which five were for AME and another five for AMEn, the R² values of which ranged from 0.20 to 0.75 and from 0.21 to 0.78, respectively. For all adjusted equations, negative correlations for MM were observed, which may be related to its dilutive effect of the gross energy contained in corn. In conclusion, the equations that showed better adjustment were AME= 5605.46 - 385.074CP + 111.648EE + 48.1133NDF + 303.924ADF - 929.931MM (R²= 0.75 and AMEn= 5878.16 - 403.937CP + 81.9618EE + 41.8954NDF + 303.506FDA - 901.621MM (R²= 0.78.
Modeling a Predictive Energy Equation Specific for Maintenance Hemodialysis.

Science.gov (United States)

Byham-Gray, Laura D; Parrott, J Scott; Peters, Emily N; Fogerite, Susan Gould; Hand, Rosa K; Ahrens, Sean; Marcus, Andrea Fleisch; Fiutem, Justin J

2017-03-01

Hypermetabolism is theorized in patients diagnosed with chronic kidney disease who are receiving maintenance hemodialysis (MHD). We aimed to distinguish key disease-specific determinants of resting energy expenditure to create a predictive energy equation that more precisely establishes energy needs with the intent of preventing protein-energy wasting. For this 3-year multisite cross-sectional study (N = 116), eligible participants were diagnosed with chronic kidney disease and were receiving MHD for at least 3 months. Predictors for the model included weight, sex, age, C-reactive protein (CRP), glycosylated hemoglobin, and serum creatinine. The outcome variable was measured resting energy expenditure (mREE). Regression modeling was used to generate predictive formulas and Bland-Altman analyses to evaluate accuracy. The majority were male (60.3%), black (81.0%), and non-Hispanic (76.7%), and 23% were ≥65 years old. After screening for multicollinearity, the best predictive model of mREE ( R 2 = 0.67) included weight, age, sex, and CRP. Two alternative models with acceptable predictability ( R 2 = 0.66) were derived with glycosylated hemoglobin or serum creatinine. Based on Bland-Altman analyses, the maintenance hemodialysis equation that included CRP had the best precision, with the highest proportion of participants' predicted energy expenditure classified as accurate (61.2%) and with the lowest number of individuals with underestimation or overestimation. This study confirms disease-specific factors as key determinants of mREE in patients on MHD and provides a preliminary predictive energy equation. Further prospective research is necessary to test the reliability and validity of this equation across diverse populations of patients who are receiving MHD.
Prediction equations for spirometry in four- to six-year-old children.

Science.gov (United States)

França, Danielle Corrêa; Camargos, Paulo Augusto Moreira; Jones, Marcus Herbert; Martins, Jocimar Avelar; Vieira, Bruna da Silva Pinto Pinheiro; Colosimo, Enrico Antônio; de Mendonça, Karla Morganna Pereira Pinto; Borja, Raíssa de Oliveira; Britto, Raquel Rodrigues; Parreira, Verônica Franco

2016-01-01

To generate prediction equations for spirometry in 4- to 6-year-old children. Forced vital capacity, forced expiratory volume in 0.5s, forced expiratory volume in one second, peak expiratory flow, and forced expiratory flow at 25-75% of the forced vital capacity were assessed in 195 healthy children residing in the town of Sete Lagoas, state of Minas Gerais, Southeastern Brazil. The least mean squares method was used to derive the prediction equations. The level of significance was established as p<0.05. Overall, 85% of the children succeeded in performing the spirometric maneuvers. In the prediction equation, height was the single predictor of the spirometric variables as follows: forced vital capacity=exponential [(-2.255)+(0.022×height)], forced expiratory volume in 0.5s=exponential [(-2.288)+(0.019×height)], forced expiratory volume in one second=exponential [(-2.767)+(0.026×height)], peak expiratory flow=exponential [(-2.908)+(0.019×height)], and forced expiratory flow at 25-75% of the forced vital capacity=exponential [(-1.404)+(0.016×height)]. Neither age nor weight influenced the regression equations. No significant differences in the predicted values for boys and girls were observed. The predicted values obtained in the present study are comparable to those reported for preschoolers from both Brazil and other countries. Copyright © 2016 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.
Prediction equations for spirometry in adults from northern India.

Science.gov (United States)

Chhabra, S K; Kumar, R; Gupta, U; Rahman, M; Dash, D J

2014-01-01

Most of the Indian studies on prediction equations for spirometry in adults are several decades old and may have lost their utility as these were carried out with equipment and standardisation protocols that have since changed. Their validity is further questionable as the lung health of the population is likely to have changed over time. To develop prediction equations for spirometry in adults of north Indian origin using the 2005 American Thoracic Society/European Respiratory Society (ATS/ERS) recommendations on standardisation. Normal healthy non-smoker subjects, both males and females, aged 18 years and above underwent spirometry using a non-heated Fleisch Pneumotach spirometer calibrated daily. The dataset was randomly divided into training (70%) and test (30%) sets and the former was used to develop the equations. These were validated on the test data set. Prediction equations were developed separately for males and females for forced vital capacity (FVC), forced expiratory volume in first second (FEV1), FEV1/FVC ratio, and instantaneous expiratory flow rates using multiple linear regression procedure with different transformations of dependent and/or independent variables to achieve the best-fitting models for the data. The equations were compared with the previous ones developed in the same population in the 1960s. In all, 685 (489 males, 196 females) subjects performed spirometry that was technically acceptable and repeatable. All the spirometry parameters were significantly higher among males except the FEV1/FVC ratio that was significantly higher in females. Overall, age had a negative relationship with the spirometry parameters while height was positively correlated with each, except for the FEV1/FVC ratio that was related only to age. Weight was included in the models for FVC, forced expiratory flow (FEF75) and FEV1/FVC ratio in males, but its contribution was very small. Standard errors of estimate were provided to enable calculation of the lower
Family differences in equations for predicting biomass and leaf area in Douglas-fir (Pseudotsuga menziesii var. menziesii).

Science.gov (United States)

J.B. St. Clair

1993-01-01

Logarithmic regression equations were developed to predict component biomass and leaf area for an 18-yr-old genetic test of Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco var. menziesii) based on stem diameter or cross-sectional sapwood area. Equations did not differ among open-pollinated families in slope, but intercepts...
Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

Science.gov (United States)

Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

2014-12-01

This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

Science.gov (United States)

Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

2015-01-01

Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Updated logistic regression equations for the calculation of post-fire debris-flow likelihood in the western United States

Science.gov (United States)

Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.

2016-06-30

Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Multi-fidelity Gaussian process regression for prediction of random fields

International Nuclear Information System (INIS)

Parussini, L.; Venturi, D.; Perdikaris, P.; Karniadakis, G.E.

2017-01-01

We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.
Multi-fidelity Gaussian process regression for prediction of random fields

Energy Technology Data Exchange (ETDEWEB)

Parussini, L. [Department of Engineering and Architecture, University of Trieste (Italy); Venturi, D., E-mail: venturi@ucsc.edu [Department of Applied Mathematics and Statistics, University of California Santa Cruz (United States); Perdikaris, P. [Department of Mechanical Engineering, Massachusetts Institute of Technology (United States); Karniadakis, G.E. [Division of Applied Mathematics, Brown University (United States)

2017-05-01

We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.
Multiple regression and beyond an introduction to multiple regression and structural equation modeling

CERN Document Server

Keith, Timothy Z

2014-01-01

Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.
Real estate value prediction using multivariate regression models

Science.gov (United States)

Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

2017-11-01

The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Development and validation of a predictive equation for lean body mass in children and adolescents.

Science.gov (United States)

Foster, Bethany J; Platt, Robert W; Zemel, Babette S

2012-05-01

Lean body mass (LBM) is not easy to measure directly in the field or clinical setting. Equations to predict LBM from simple anthropometric measures, which account for the differing contributions of fat and lean to body weight at different ages and levels of adiposity, would be useful to both human biologists and clinicians. To develop and validate equations to predict LBM in children and adolescents across the entire range of the adiposity spectrum. Dual energy X-ray absorptiometry was used to measure LBM in 836 healthy children (437 females) and linear regression was used to develop sex-specific equations to estimate LBM from height, weight, age, body mass index (BMI) for age z-score and population ancestry. Equations were validated using bootstrapping methods and in a local independent sample of 332 children and in national data collected by NHANES. The mean difference between measured and predicted LBM was - 0.12% (95% limits of agreement - 11.3% to 8.5%) for males and - 0.14% ( - 11.9% to 10.9%) for females. Equations performed equally well across the entire adiposity spectrum, as estimated by BMI z-score. Validation indicated no over-fitting. LBM was predicted within 5% of measured LBM in the validation sample. The equations estimate LBM accurately from simple anthropometric measures.
Gaussian Process Regression for WDM System Performance Prediction

DEFF Research Database (Denmark)

Wass, Jesper; Thrane, Jakob; Piels, Molly

2017-01-01

Gaussian process regression is numerically and experimentally investigated to predict the bit error rate of a 24 x 28 CiBd QPSK WDM system. The proposed method produces accurate predictions from multi-dimensional and sparse measurement data.......Gaussian process regression is numerically and experimentally investigated to predict the bit error rate of a 24 x 28 CiBd QPSK WDM system. The proposed method produces accurate predictions from multi-dimensional and sparse measurement data....
A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

Science.gov (United States)

Smith, Paul F; Ganesh, Siva; Liu, Ping

2013-10-30

Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

Dynamic prediction of cumulative incidence functions by direct binomial regression.

Science.gov (United States)

Grand, Mia K; de Witte, Theo J M; Putter, Hein

2018-03-25

In recent years there have been a series of advances in the field of dynamic prediction. Among those is the development of methods for dynamic prediction of the cumulative incidence function in a competing risk setting. These models enable the predictions to be updated as time progresses and more information becomes available, for example when a patient comes back for a follow-up visit after completing a year of treatment, the risk of death, and adverse events may have changed since treatment initiation. One approach to model the cumulative incidence function in competing risks is by direct binomial regression, where right censoring of the event times is handled by inverse probability of censoring weights. We extend the approach by combining it with landmarking to enable dynamic prediction of the cumulative incidence function. The proposed models are very flexible, as they allow the covariates to have complex time-varying effects, and we illustrate how to investigate possible time-varying structures using Wald tests. The models are fitted using generalized estimating equations. The method is applied to bone marrow transplant data and the performance is investigated in a simulation study. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Reference equation for prediction of a total distance during six-minute walk test using Indonesian anthropometrics.

Science.gov (United States)

Nusdwinuringtyas, Nury; Widjajalaksmi; Yunus, Faisal; Alwi, Idrus

2014-04-01

to develop a reference equation for prediction of the total distance walk using Indonesian anthropometrics of sedentary healthy subjects. Subsequently, the prediction obtained was compared to those calculated by the Caucasian-based Enright prediction equation. the cross-sectional study was conducted among 123 healthy Indonesian adults with sedentary life style (58 male and 65 female subjects in an age range between 18 and 50 years). Heart rate was recorded using Polar with expectation in the sub-maximal zone (120-170 beats per minute). The subjects performed two six-minute walk tests, the first one on a 15-meter track according to the protocol developed by the investigator. The second walk was carried out on Biodex®gait trainer as gold standard. an average total distance of 547±54.24 m was found, not significantly different from the gold standard of 544.72±54.11 m (p>0.05). Multiple regression analysis was performed to develop the new equation. the reference equation for prediction of the total distance using Indonesian anthropometrics is more applicable in Indonesia.
A Hybrid Ground-Motion Prediction Equation for Earthquakes in Western Alberta

Science.gov (United States)

Spriggs, N.; Yenier, E.; Law, A.; Moores, A. O.

2015-12-01

Estimation of ground-motion amplitudes that may be produced by future earthquakes constitutes the foundation of seismic hazard assessment and earthquake-resistant structural design. This is typically done by using a prediction equation that quantifies amplitudes as a function of key seismological variables such as magnitude, distance and site condition. In this study, we develop a hybrid empirical prediction equation for earthquakes in western Alberta, where evaluation of seismic hazard associated with induced seismicity is of particular interest. We use peak ground motions and response spectra from recorded seismic events to model the regional source and attenuation attributes. The available empirical data is limited in the magnitude range of engineering interest (M>4). Therefore, we combine empirical data with a simulation-based model in order to obtain seismologically informed predictions for moderate-to-large magnitude events. The methodology is two-fold. First, we investigate the shape of geometrical spreading in Alberta. We supplement the seismic data with ground motions obtained from mining/quarry blasts, in order to gain insights into the regional attenuation over a wide distance range. A comparison of ground-motion amplitudes for earthquakes and mining/quarry blasts show that both event types decay at similar rates with distance and demonstrate a significant Moho-bounce effect. In the second stage, we calibrate the source and attenuation parameters of a simulation-based prediction equation to match the available amplitude data from seismic events. We model the geometrical spreading using a trilinear function with attenuation rates obtained from the first stage, and calculate coefficients of anelastic attenuation and site amplification via regression analysis. This provides a hybrid ground-motion prediction equation that is calibrated for observed motions in western Alberta and is applicable to moderate-to-large magnitude events.
Prediction equation for estimating total daily energy requirements of special operations personnel.

Science.gov (United States)

Barringer, N D; Pasiakos, S M; McClung, H L; Crombie, A P; Margolis, L M

2018-01-01

Special Operations Forces (SOF) engage in a variety of military tasks with many producing high energy expenditures, leading to undesired energy deficits and loss of body mass. Therefore, the ability to accurately estimate daily energy requirements would be useful for accurate logistical planning. Generate a predictive equation estimating energy requirements of SOF. Retrospective analysis of data collected from SOF personnel engaged in 12 different SOF training scenarios. Energy expenditure and total body water were determined using the doubly-labeled water technique. Physical activity level was determined as daily energy expenditure divided by resting metabolic rate. Physical activity level was broken into quartiles (0 = mission prep, 1 = common warrior tasks, 2 = battle drills, 3 = specialized intense activity) to generate a physical activity factor (PAF). Regression analysis was used to construct two predictive equations (Model A; body mass and PAF, Model B; fat-free mass and PAF) estimating daily energy expenditures. Average measured energy expenditure during SOF training was 4468 (range: 3700 to 6300) Kcal·d- 1 . Regression analysis revealed that physical activity level ( r = 0.91; P plan appropriate feeding regimens to meet SOF nutritional requirements across their mission profile.
Expanded prediction equations of human sweat loss and water needs.

Science.gov (United States)

Gonzalez, R R; Cheuvront, S N; Montain, S J; Goodman, D A; Blanchard, L A; Berglund, L G; Sawka, M N

2009-08-01

The Institute of Medicine expressed a need for improved sweating rate (msw) prediction models that calculate hourly and daily water needs based on metabolic rate, clothing, and environment. More than 25 years ago, the original Shapiro prediction equation (OSE) was formulated as msw (g.m(-2).h(-1))=27.9.Ereq.(Emax)(-0.455), where Ereq is required evaporative heat loss and Emax is maximum evaporative power of the environment; OSE was developed for a limited set of environments, exposures times, and clothing systems. Recent evidence shows that OSE often overpredicts fluid needs. Our study developed a corrected OSE and a new msw prediction equation by using independent data sets from a wide range of environmental conditions, metabolic rates (rest to losses were carefully measured in 101 volunteers (80 males and 21 females; >500 observations) by using a variety of metabolic rates over a range of environmental conditions (ambient temperature, 15-46 degrees C; water vapor pressure, 0.27-4.45 kPa; wind speed, 0.4-2.5 m/s), clothing, and equipment combinations and durations (2-8 h). Data are expressed as grams per square meter per hour and were analyzed using fuzzy piecewise regression. OSE overpredicted sweating rates (Pdata (21 males and 9 females; >200 observations). OSEC and PW were more accurate predictors of sweating rate (58 and 65% more accurate, Perror (standard error estimate<100 g.m(-2).h(-1)) for conditions both within and outside the original OSE domain of validity. The new equations provide for more accurate sweat predictions over a broader range of conditions with applications to public health, military, occupational, and sports medicine settings.
Developing A New Predictive Dispersion Equation Based on Tidal Average (TA) Condition in Alluvial Estuaries

Science.gov (United States)

Anak Gisen, Jacqueline Isabella; Nijzink, Remko C.; Savenije, Hubert H. G.

2014-05-01

Dispersion mathematical representation of tidal mixing between sea water and fresh water in The definition of dispersion somehow remains unclear as it is not directly measurable. The role of dispersion is only meaningful if it is related to the appropriate temporal and spatial scale of mixing, which are identified as the tidal period, tidal excursion (longitudinal), width of estuary (lateral) and mixing depth (vertical). Moreover, the mixing pattern determines the salt intrusion length in an estuary. If a physically based description of the dispersion is defined, this would allow the analytical solution of the salt intrusion problem. The objective of this study is to develop a predictive equation for estimating the dispersion coefficient at tidal average (TA) condition, which can be applied in the salt intrusion model to predict the salinity profile for any estuary during different events. Utilizing available data of 72 measurements in 27 estuaries (including 6 recently studied estuaries in Malaysia), regressions analysis has been performed with various combinations of dimensionless parameters . The predictive dispersion equations have been developed for two different locations, at the mouth D0TA and at the inflection point D1TA (where the convergence length changes). Regressions have been carried out with two separated datasets: 1) more reliable data for calibration; and 2) less reliable data for validation. The combination of dimensionless ratios that give the best performance is selected as the final outcome which indicates that the dispersion coefficient is depending on the tidal excursion, tidal range, tidal velocity amplitude, friction and the Richardson Number. A limitation of the newly developed equation is that the friction is generally unknown. In order to compensate this problem, further analysis has been performed adopting the hydraulic model of Cai et. al. (2012) to estimate the friction and depth. Keywords: dispersion, alluvial estuaries, mixing, salt
Developing prediction equations and a mobile phone application to identify infants at risk of obesity.

Science.gov (United States)

Santorelli, Gillian; Petherick, Emily S; Wright, John; Wilson, Brad; Samiei, Haider; Cameron, Noël; Johnson, William

2013-01-01

Advancements in knowledge of obesity aetiology and mobile phone technology have created the opportunity to develop an electronic tool to predict an infant's risk of childhood obesity. The study aims were to develop and validate equations for the prediction of childhood obesity and integrate them into a mobile phone application (App). Anthropometry and childhood obesity risk data were obtained for 1868 UK-born White or South Asian infants in the Born in Bradford cohort. Logistic regression was used to develop prediction equations (at 6 ± 1.5, 9 ± 1.5 and 12 ± 1.5 months) for risk of childhood obesity (BMI at 2 years >91(st) centile and weight gain from 0-2 years >1 centile band) incorporating sex, birth weight, and weight gain as predictors. The discrimination accuracy of the equations was assessed by the area under the curve (AUC); internal validity by comparing area under the curve to those obtained in bootstrapped samples; and external validity by applying the equations to an external sample. An App was built to incorporate six final equations (two at each age, one of which included maternal BMI). The equations had good discrimination (AUCs 86-91%), with the addition of maternal BMI marginally improving prediction. The AUCs in the bootstrapped and external validation samples were similar to those obtained in the development sample. The App is user-friendly, requires a minimum amount of information, and provides a risk assessment of low, medium, or high accompanied by advice and website links to government recommendations. Prediction equations for risk of childhood obesity have been developed and incorporated into a novel App, thereby providing proof of concept that childhood obesity prediction research can be integrated with advancements in technology.
Multiple linear regression to develop strength scaled equations for knee and elbow joints based on age, gender and segment mass

DEFF Research Database (Denmark)

D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar

2012-01-01

and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...... flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (PGender, thigh mass and age best...... predicted KPT (R2=0.60). Gender, forearm mass and age best predicted EPT (R2=0.75). Good crossvalidation was established for both elbow and knee models. Conclusion: This cross-sectional study of muscle strength created and validated strength scaled equations of EPT and KPT using only gender, segment mass...
Prediction of diffuse solar irradiance using machine learning and multivariable regression

International Nuclear Information System (INIS)

Lou, Siwei; Li, Danny H.W.; Lam, Joseph C.; Chan, Wilco W.H.

2016-01-01

Highlights: • 54.9% of the annual global irradiance is composed by its diffuse part in HK. • Hourly diffuse irradiance was predicted by accessible variables. • The importance of variable in prediction was assessed by machine learning. • Simple prediction equations were developed with the knowledge of variable importance. - Abstract: The paper studies the horizontal global, direct-beam and sky-diffuse solar irradiance data measured in Hong Kong from 2008 to 2013. A machine learning algorithm was employed to predict the horizontal sky-diffuse irradiance and conduct sensitivity analysis for the meteorological variables. Apart from the clearness index (horizontal global/extra atmospheric solar irradiance), we found that predictors including solar altitude, air temperature, cloud cover and visibility are also important in predicting the diffuse component. The mean absolute error (MAE) of the logistic regression using the aforementioned predictors was less than 21.5 W/m"2 and 30 W/m"2 for Hong Kong and Denver, USA, respectively. With the systematic recording of the five variables for more than 35 years, the proposed model would be appropriate to estimate of long-term diffuse solar radiation, study climate change and develope typical meteorological year in Hong Kong and places with similar climates.
Proposition of Regression Equations to Determine Outdoor Thermal Comfort in Tropical and Humid Environment

Directory of Open Access Journals (Sweden)

Sangkertadi Sangkertadi

2012-05-01

Full Text Available This study is about field experimentation in order to construct regression equations of perception of thermalcomfort for outdoor activities under hot and humid environment. Relationships between thermal-comfort perceptions, micro climate variables (temperatures and humidity and body parameters (activity, clothing, body measure have been observed and analyzed. 180 adults, men, and women participated as samples/respondents. This study is limited for situation where wind velocity is about 1 m/s, which touch the body of the respondents/samples. From questionnaires and field measurements, three regression equations have been developed, each for activity of normal walking, brisk walking, and sitting.
Development of a predictive energy equation for maintenance hemodialysis patients: a pilot study.

Science.gov (United States)

Byham-Gray, Laura; Parrott, J Scott; Ho, Wai Yin; Sundell, Mary B; Ikizler, T Alp

2014-01-01

The study objectives were to explore the predictors of measured resting energy expenditure (mREE) among a sample of maintenance hemodialysis (MHD) patients, to generate a predictive energy equation (MHDE), and to compare such models to another commonly used predictive energy equation in nutritional care, the Mifflin-St. Jeor equation (MSJE). The study was a retrospective, cross-sectional cohort design conducted at the Vanderbilt University Medical Center. Study subjects were adult MHD patients (N = 67). Data collected from several clinical trials were analyzed using Pearson's correlation and multivariate linear regression procedures. Demographic, anthropometric, clinical, and laboratory data were examined as potential predictors of mREE. Limits of agreement between the MHDE and the MSJE were evaluated using Bland-Altman plots. The a priori α was set at P lean body mass [LBM]) of mREE included (R(2) = 0.489) FFM, ALB, age, and CRP. Two additional models (MHDE-CRP and MHDE-CR) with acceptable predictability (R(2) = 0.460 and R(2) = 0.451) were derived to improve the clinical utility of the developed energy equation (MHDE-LBM). Using Bland-Altman plots, the MHDE over- and underpredicted mREE less often than the MSJE. Predictive models (MHDE) including selective demographic, clinical, and anthropometric data explained less than 50% variance of mREE but had better precision in determining energy requirements for MHD patients when compared with MSJE. Further research is necessary to improve predictive models of mREE in the MHD population and to test its validity and clinical application. Copyright © 2014 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Ground Motion Prediction Equations Empowered by Stress Drop Measurement

Science.gov (United States)

Miyake, H.; Oth, A.

2015-12-01

Significant variation of stress drop is a crucial issue for ground motion prediction equations and probabilistic seismic hazard assessment, since only a few ground motion prediction equations take into account stress drop. In addition to average and sigma studies of stress drop and ground motion prediction equations (e.g., Cotton et al., 2013; Baltay and Hanks, 2014), we explore 1-to-1 relationship for each earthquake between stress drop and between-event residual of a ground motion prediction equation. We used the stress drop dataset of Oth (2013) for Japanese crustal earthquakes ranging 0.1 to 100 MPa and K-NET/KiK-net ground motion dataset against for several ground motion prediction equations with volcanic front treatment. Between-event residuals for ground accelerations and velocities are generally coincident with stress drop, as investigated by seismic intensity measures of Oth et al. (2015). Moreover, we found faster attenuation of ground acceleration and velocities for large stress drop events for the similar fault distance range and focal depth. It may suggest an alternative parameterization of stress drop to control attenuation distance rate for ground motion prediction equations. We also investigate 1-to-1 relationship and sigma for regional/national-scale stress drop variation and current national-scale ground motion equations.
Prediction, Regression and Critical Realism

DEFF Research Database (Denmark)

Næss, Petter

2004-01-01

This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...... seen as necessary in order to identify aggregate level effects of policy measures, but are questioned by many advocates of critical realist ontology. Using research into the relationship between urban structure and travel as an example, the paper discusses relevant research methods and the kinds...
Evaluation of abutment scour prediction equations with field data

Science.gov (United States)

Benedict, S.T.; Deshpande, N.; Aziz, N.M.

2007-01-01

The U.S. Geological Survey, in cooperation with FHWA, compared predicted abutment scour depths, computed with selected predictive equations, with field observations collected at 144 bridges in South Carolina and at eight bridges from the National Bridge Scour Database. Predictive equations published in the 4th edition of Evaluating Scour at Bridges (Hydraulic Engineering Circular 18) were used in this comparison, including the original Froehlich, the modified Froehlich, the Sturm, the Maryland, and the HIRE equations. The comparisons showed that most equations tended to provide conservative estimates of scour that at times were excessive (as large as 158 ft). Equations also produced underpredictions of scour, but with less frequency. Although the equations provide an important resource for evaluating abutment scour at bridges, the results of this investigation show the importance of using engineering judgment in conjunction with these equations.
Statistical experiments using the multiple regression research for prediction of proper hardness in areas of phosphorus cast-iron brake shoes manufacturing

Science.gov (United States)

Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.

2018-01-01

Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.
Best-fitting prediction equations for basal metabolic rate: informing obesity interventions in diverse populations.

Science.gov (United States)

Sabounchi, N S; Rahmandad, H; Ammerman, A

2013-10-01

Basal metabolic rate (BMR) represents the largest component of total energy expenditure and is a major contributor to energy balance. Therefore, accurately estimating BMR is critical for developing rigorous obesity prevention and control strategies. Over the past several decades, numerous BMR formulas have been developed targeted to different population groups. A comprehensive literature search revealed 248 BMR estimation equations developed using diverse ranges of age, gender, race, fat-free mass, fat mass, height, waist-to-hip ratio, body mass index and weight. A subset of 47 studies included enough detail to allow for development of meta-regression equations. Utilizing these studies, meta-equations were developed targeted to 20 specific population groups. This review provides a comprehensive summary of available BMR equations and an estimate of their accuracy. An accompanying online BMR prediction tool (available at http://www.sdl.ise.vt.edu/tutorials.html) was developed to automatically estimate BMR based on the most appropriate equation after user-entry of individual age, race, gender and weight.
A validated disease specific prediction equation for resting metabolic rate in underweight patients with COPD

Directory of Open Access Journals (Sweden)

Anita Nordenson

2010-09-01

Full Text Available Anita Nordenson2, Anne Marie Grönberg1,2, Lena Hulthén1, Sven Larsson2, Frode Slinde11Department of Clinical Nutrition, Sahlgrenska Academy at University of Gothenburg, Göteborg, Sweden; 2Department of Internal Medicine/Respiratory Medicine and Allergology, Sahlgrenska Academy at University of Gothenburg, SwedenAbstract: Malnutrition is a serious condition in chronic obstructive pulmonary disease (COPD. Successful dietary intervention calls for calculations of resting metabolic rate (RMR. One disease-specific prediction equation for RMR exists based on mainly male patients. To construct a disease-specific equation for RMR based on measurements in underweight or weight-losing women and men with COPD, RMR was measured by indirect calorimetry in 30 women and 11 men with a diagnosis of COPD and body mass index <21 kg/m2. The following variables, possibly influencing RMR were measured: length, weight, middle upper arm circumference, triceps skinfold, body composition by dual energy x-ray absorptiometry and bioelectrical impedance, lung function, and markers of inflammation. Relations between RMR and measured variables were studied using univariate analysis according to Pearson. Gender and variables that were associated with RMR with a P value <0.15 were included in a forward multiple regression analysis. The best-fit multiple regression equation included only fat-free mass (FFM: RMR (kJ/day = 1856 + 76.0 FFM (kg. To conclude, FFM is the dominating factor influencing RMR. The developed equation can be used for prediction of RMR in underweight COPD patients.Keywords: pulmonary disease, chronic obstructive, basal metabolic rate, malnutrition, body composition
Regression formulae for predicting hematologic and liver functions ...

African Journals Online (AJOL)

African Journal of Biomedical Research ... On the other hand platelet and white blood cell (WBC) counts in these workers correlated positively with years of service [r = 0.342 (P <0.001) and r = 0.130 (P<0.0001) ... The regression equation defining this relationship is: ALP concentration = 33.68 – 0.075 x years of service.
ANTHROPOMETRIC PREDICTIVE EQUATIONS FOR ...

African Journals Online (AJOL)

Keywords: Anthropometry, Predictive Equations, Percentage Body Fat, Nigerian Women, Bioelectric Impedance ... such as Asians and Indians (Pranav et al., 2009), ... size (n) of at least 3o is adjudged as sufficient for the ..... of people, gender and age (Vogel eta/., 1984). .... Fish Sold at Ile-Ife Main Market, South West Nigeria.
Revised predictive equations for salt intrusion modelling in estuaries

NARCIS (Netherlands)

Gisen, J.I.A.; Savenije, H.H.G.; Nijzink, R.C.

2015-01-01

For one-dimensional salt intrusion models to be predictive, we need predictive equations to link model parameters to observable hydraulic and geometric variables. The one-dimensional model of Savenije (1993b) made use of predictive equations for the Van der Burgh coefficient $K$ and the dispersion

Testing the transferability of regression equations derived from small sub-catchments to a large area in central Sweden

Directory of Open Access Journals (Sweden)

C. Xu

2003-01-01

Full Text Available There is an ever increasing need to apply hydrological models to catchments where streamflow data are unavailable or to large geographical regions where calibration is not feasible. Estimation of model parameters from spatial physical data is the key issue in the development and application of hydrological models at various scales. To investigate the suitability of transferring the regression equations relating model parameters to physical characteristics developed from small sub-catchments to a large region for estimating model parameters, a conceptual snow and water balance model was optimised on all the sub-catchments in the region. A multiple regression analysis related model parameters to physical data for the catchments and the regression equations derived from the small sub-catchments were used to calculate regional parameter values for the large basin using spatially aggregated physical data. For the model tested, the results support the suitability of transferring the regression equations to the larger region. Keywords: water balance modelling,large scale, multiple regression, regionalisation
Predicting Word Reading Ability: A Quantile Regression Study

Science.gov (United States)

McIlraith, Autumn L.

2018-01-01

Predictors of early word reading are well established. However, it is unclear if these predictors hold for readers across a range of word reading abilities. This study used quantile regression to investigate predictive relationships at different points in the distribution of word reading. Quantile regression analyses used preschool and…
Resting energy expenditure prediction in recreational athletes of 18-35 years: confirmation of Cunningham equation and an improved weight-based alternative.

Science.gov (United States)

ten Haaf, Twan; Weijs, Peter J M

2014-01-01

Resting energy expenditure (REE) is expected to be higher in athletes because of their relatively high fat free mass (FFM). Therefore, REE predictive equation for recreational athletes may be required. The aim of this study was to validate existing REE predictive equations and to develop a new recreational athlete specific equation. 90 (53 M, 37 F) adult athletes, exercising on average 9.1 ± 5.0 hours a week and 5.0 ± 1.8 times a week, were included. REE was measured using indirect calorimetry (Vmax Encore n29), FFM and FM were measured using air displacement plethysmography. Multiple linear regression analysis was used to develop a new FFM-based and weight-based REE predictive equation. The percentage accurate predictions (within 10% of measured REE), percentage bias, root mean square error and limits of agreement were calculated. Results: The Cunningham equation and the new weight-based equation REE(kJ / d) = 49.940* weight(kg) + 2459.053* height(m) - 34.014* age(y) + 799.257* sex(M = 1,F = 0) + 122.502 and the new FFM-based equation REE(kJ / d) = 95.272*FFM(kg) + 2026.161 performed equally well. De Lorenzo's equation predicted REE less accurate, but better than the other generally used REE predictive equations. Harris-Benedict, WHO, Schofield, Mifflin and Owen all showed less than 50% accuracy. For a population of (Dutch) recreational athletes, the REE can accurately be predicted with the existing Cunningham equation. Since body composition measurement is not always possible, and other generally used equations fail, the new weight-based equation is advised for use in sports nutrition.
Comparison of predictive equations for resting energy expenditure among patients with schizophrenia in Japan

Directory of Open Access Journals (Sweden)

Sugawara N

2014-02-01

Full Text Available Norio Sugawara,1 Norio Yasui-Furukori,1 Tetsu Tomita,1,2 Hanako Furukori,3 Kazutoshi Kubo,1,4 Taku Nakagami,1,4 Sunao Kaneko1 1Department of Neuropsychiatry, Hirosaki University School of Medicine, Hirosaki, 2Department of Psychiatry, Hirosaki-Aiseikai Hospital, Hirosaki, 3Department of Psychiatry, Kuroishi-Akebono Hospital, Kuroishi, 4Department of Psychiatry, Odate Municipal General Hospital, Odate, Japan Background: Recently, a relationship between obesity and schizophrenia has been reported. The prediction of resting energy expenditure (REE is important to determine the energy expenditure of patients with schizophrenia. However, there is a lack of research concerning the most accurate REE predictive equations among Asian patients with schizophrenia. The purpose of the study reported here was to compare the validity of four REE equations for patients with schizophrenia taking antipsychotics. Methods: For this cross-sectional study, we recruited patients (n=110 who had a Diagnostic and Statistical Manual of Mental Disorders, fourth edition, diagnosis of schizophrenia and were admitted to four psychiatric hospitals. The mean (± standard deviation age of these patients was 45.9±13.2 years. Anthropometric measurements (of height, weight, body mass index were taken at the beginning of the study. REE was measured using indirect calorimetry. Comparisons between the measured and estimated REEs from the four equations (Harris–Benedict, Mifflin–St Jeor, Food and Agriculture Organization/World Health Organization/United Nations University, and Schofield were performed using simple linear regression analysis and Bland–Altman analysis. Results: Significant trends were found between the measured and predicted REEs for all four equations (P<0.001, with the Harris–Benedict equation demonstrating the strongest correlation in both men and women (r=0.617, P<0.001. In all participants, Bland–Altman analysis revealed that the Harris–Benedict and
Validity of predictive equations for basal metabolic rate in Japanese adults.

Science.gov (United States)

Miyake, Rieko; Tanaka, Shigeho; Ohkawara, Kazunori; Ishikawa-Takata, Kazuko; Hikihara, Yuki; Taguri, Emiko; Kayashita, Jun; Tabata, Izumi

2011-01-01

Many predictive equations for basal metabolic rate (BMR) based on anthropometric measurements, age, and sex have been developed, mainly for healthy Caucasians. However, it has been reported that many of these equations, used widely, overestimate BMR not only for Asians, but also for Caucasians. The present study examined the accuracy of several predictive equations for BMR in Japanese subjects. In 365 healthy Japanese male and female subjects, aged 18 to 79 y, BMR was measured in the post-absorptive state using a mask and Douglas bag. Six predictive equations were examined. Total error was used as an index of the accuracy of each equation's prediction. Predicted BMR values by Dietary Reference Intakes for Japanese (Japan-DRI), Adjusted Dietary Reference Intakes for Japanese (Adjusted-DRI), and Ganpule equations were not significantly different from the measured BMR in either sex. On the other hand, Harris-Benedict, Schofield, and Food and Agriculture Organization of the United Nations/World Health Organization/United Nations University equations were significantly higher than the measured BMR in both sexes. The prediction error by Japan-DRI, Adjusted-DRI, and Harris-Benedict equations was significantly correlated with body weight in both sexes. Total error using the Ganpule equation was low in both males and females (125 and 99 kcal/d, respectively). In addition, total error using the Adjusted-DRI equation was low in females (95 kcal/d). Thus, the Ganpule equation was the most accurate in predicting BMR in our healthy Japanese subjects, because the difference between the predicted and measured BMR was relatively small, and body weight had no effect on the prediction error.
Linear regression crash prediction models : issues and proposed solutions.

Science.gov (United States)

2010-05-01

The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

Science.gov (United States)

Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

2016-07-05

Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Enhancement of Visual Field Predictions with Pointwise Exponential Regression (PER) and Pointwise Linear Regression (PLR).

Science.gov (United States)

Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph

2016-03-01

The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.
Validation of the mortality prediction equation for damage control ...

African Journals Online (AJOL)

, preoperative lowest pH and lowest core body temperature to derive an equation for the purpose of predicting mortality in damage control surgery. It was shown to reliably predict death despite damage control surgery. The equation derivation ...
Comparison of equations for predicting energy expenditure from accelerometer counts in children

DEFF Research Database (Denmark)

Nilsson, A; Brage, S; Riddoch, C

2008-01-01

calorimeter-based (CAL) equation (mixture of activities). Predicted physical activity energy expenditure (PAEE) was the main outcome variable. In comparison with DLW-predicted PAEE, both laboratory-derived equations significantly (PPAEE by 17% and 83%, respectively, when based on a 24-h...... prediction, while the TM equation significantly (PPAEE by 46%, when based on awake time only. In contrast, the CAL equation agreed better with the DLW equation under the awake time assumption. Predicted PAEE differ substantially between equations, depending on time-frame assumptions......, and interpretations of average levels of PAEE in children from available equations should be made with caution. Further development of equations applicable to free-living scenarios is needed....
Equations of prediction for abdominal fat in brown egg-laying hens fed different diets.

Science.gov (United States)

Souza, C; Jaimes, J J B; Gewehr, C E

2017-06-01

The objective was to use noninvasive measurements to formulate equations for predicting the abdominal fat weight of laying hens in a noninvasive manner. Hens were fed with different diets; the external body measurements of birds were used as regressors. We used 288 Hy-Line Brown laying hens, distributed in a completely randomized design in a factorial arrangement, submitted for 16 wk to 2 metabolizable energy levels (2,550 and 2,800 kcal/kg) and 3 levels of crude protein in the diet (150, 160, and 170 g/kg), totaling 6 treatments, with 48 hens each. Sixteen hens per treatment of 92 wk age were utilized to evaluate body weight, bird length, tarsus and sternum, greater and lesser diameter of the tarsus, and abdominal fat weight, after slaughter. The equations were obtained by using measures evaluated with regressors through simple and multiple linear regression with the stepwise method of indirect elimination (backward), with P abdominal fat as predicted by the equations and observed values for each bird were subjected to Pearson's correlation analysis. The equations generated by energy levels showed coefficients of determination of 0.50 and 0.74 for 2,800 and 2,550 kcal/kg of metabolizable energy, respectively, with correlation coefficients of 0.71 and 0.84, with a highly significant correlation between the calculated and observed values of abdominal fat. For protein levels of 150, 160, and 170 g/kg in the diet, it was possible to obtain coefficients of determination of 0.75, 0.57, and 0.61, with correlation coefficients of 0.86, 0.75, and 0.78, respectively. Regarding the general equation for predicting abdominal fat weight, the coefficient of determination was 0.62; the correlation coefficient was 0.79. The equations for predicting abdominal fat weight in laying hens, based on external measurements of the birds, showed positive coefficients of determination and correlation coefficients, thus allowing researchers to determine abdominal fat weight in vivo. �
Reservoir rock permeability prediction using support vector regression in an Iranian oil field

International Nuclear Information System (INIS)

Saffarzadeh, Sadegh; Shadizadeh, Seyed Reza

2012-01-01

Reservoir permeability is a critical parameter for the evaluation of hydrocarbon reservoirs. It is often measured in the laboratory from reservoir core samples or evaluated from well test data. The prediction of reservoir rock permeability utilizing well log data is important because the core analysis and well test data are usually only available from a few wells in a field and have high coring and laboratory analysis costs. Since most wells are logged, the common practice is to estimate permeability from logs using correlation equations developed from limited core data; however, these correlation formulae are not universally applicable. Recently, support vector machines (SVMs) have been proposed as a new intelligence technique for both regression and classification tasks. The theory has a strong mathematical foundation for dependence estimation and predictive learning from finite data sets. The ultimate test for any technique that bears the claim of permeability prediction from well log data is the accurate and verifiable prediction of permeability for wells where only the well log data are available. The main goal of this paper is to develop the SVM method to obtain reservoir rock permeability based on well log data. (paper)
Modeling and Prediction Using Stochastic Differential Equations

DEFF Research Database (Denmark)

Juhl, Rune; Møller, Jan Kloppenborg; Jørgensen, John Bagterp

2016-01-01

Pharmacokinetic/pharmakodynamic (PK/PD) modeling for a single subject is most often performed using nonlinear models based on deterministic ordinary differential equations (ODEs), and the variation between subjects in a population of subjects is described using a population (mixed effects) setup...... deterministic and can predict the future perfectly. A more realistic approach would be to allow for randomness in the model due to e.g., the model be too simple or errors in input. We describe a modeling and prediction setup which better reflects reality and suggests stochastic differential equations (SDEs...
Estimation of monthly solar exposure on horizontal surface by Angstrom-type regression equation

International Nuclear Information System (INIS)

Ravanshid, S.H.

1981-01-01

To obtain solar flux intensity, solar radiation measuring instruments are the best. In the absence of instrumental data there are other meteorological measurements which are related to solar energy and also it is possible to use empirical relationships to estimate solar flux intensit. One of these empirical relationships to estimate monthly averages of total solar radiation on a horizontal surface is the modified angstrom-type regression equation which has been employed in this report in order to estimate the solar flux intensity on a horizontal surface for Tehran. By comparing the results of this equation with four years measured valued by Tehran's meteorological weather station the values of meteorological constants (a,b) in the equation were obtained for Tehran. (author)
Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

Science.gov (United States)

Li, Spencer D.

2011-01-01

Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

Science.gov (United States)

Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

2014-01-01

A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Tax Evasion, Information Reporting, and the Regressive Bias Prediction

DEFF Research Database (Denmark)

Boserup, Simon Halphen; Pinje, Jori Veng

2013-01-01

evasion and audit probabilities once we account for information reporting in the tax compliance game. When conditioning on information reporting, we find that both reduced-form evidence and simulations exhibit the predicted regressive bias. However, in the overall economy, this bias is negated by the tax......Models of rational tax evasion and optimal enforcement invariably predict a regressive bias in the effective tax system, which reduces redistribution in the economy. Using Danish administrative data, we show that a calibrated structural model of this type replicates moments and correlations of tax...
Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

Science.gov (United States)

Bry, X; Verron, T; Cazes, P

2009-05-29

In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.
An equation for the prediction of human skin permeability of neutral molecules, ions and ionic species.

Science.gov (United States)

Zhang, Keda; Abraham, Michael H; Liu, Xiangli

2017-04-15

Experimental values of permeability coefficients, as log K p , of chemical compounds across human skin were collected by carefully screening the literature, and adjusted to 37°C for the effect of temperature. The values of log K p for partially ionized acids and bases were separated into those for their neutral and ionic species, forming a total data set of 247 compounds and species (including 35 ionic species). The obtained log K p values have been regressed against Abraham solute descriptors to yield a correlation equation with R 2 =0.866 and SD=0.432 log units. The equation can provide valid predictions for log K p of neutral molecules, ions and ionic species, with predictive R 2 =0.858 and predictive SD=0.445 log units calculated by the leave-one-out statistics. The predicted log K p values for Na + and Et 4 N + are in good agreement with the observed values. We calculated the values of log K p of ketoprofen as a function of the pH of the donor solution, and found that log K p markedly varies only when ketoprofen is largely ionized. This explains why models that neglect ionization of permeants still yield reasonable statistical results. The effect of skin thickness on log K p was investigated by inclusion of two indicator variables, one for intermediate thickness skin and one for full thickness skin, into the above equation. The newly obtained equations were found to be statistically very close to the above equation. Therefore, the thickness of human skin used makes little difference to the experimental values of log K p . Copyright © 2017 Elsevier B.V. All rights reserved.
Approximating prediction uncertainty for random forest regression models

Science.gov (United States)

John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

2016-01-01

Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...

Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

International Nuclear Information System (INIS)

Jafri, Y.Z.; Kamal, L.

2007-01-01

Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Height - Diameter predictive equations for Rubber (Hevea ...

African Journals Online (AJOL)

BUKOLA

They proffer logistic data for modeling and futuristic prediction for sustainable forest management. Diameter is one of the most ... in various quantitative estimation following the intricacy of time, availability of modern equipments .... growth functions. This trend is shown in Figure 1 where the prediction equations are plotted.
Predicting Social Trust with Binary Logistic Regression

Science.gov (United States)

Adwere-Boamah, Joseph; Hufstedler, Shirley

2015-01-01

This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
The prediction of intelligence in preschool children using alternative models to regression.

Science.gov (United States)

Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

2011-12-01

Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
Whole-genome regression and prediction methods applied to plant and animal breeding

NARCIS (Netherlands)

Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of
The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics

Directory of Open Access Journals (Sweden)

Ronald de Vlaming

2015-01-01

Full Text Available In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i the theoretical foundations of ridge regression, (ii its link to commonly used methods in animal breeding, (iii the computational feasibility, and (iv the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis. Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., N<10,000 the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP. However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.
Is adult gait less susceptible than paediatric gait to hip joint centre regression equation error?

Science.gov (United States)

Kiernan, D; Hosking, J; O'Brien, T

2016-03-01

Hip joint centre (HJC) regression equation error during paediatric gait has recently been shown to have clinical significance. In relation to adult gait, it has been inferred that comparable errors with children in absolute HJC position may in fact result in less significant kinematic and kinetic error. This study investigated the clinical agreement of three commonly used regression equation sets (Bell et al., Davis et al. and Orthotrak) for adult subjects against the equations of Harrington et al. The relationship between HJC position error and subject size was also investigated for the Davis et al. set. Full 3-dimensional gait analysis was performed on 12 healthy adult subjects with data for each set compared to Harrington et al. The Gait Profile Score, Gait Variable Score and GDI-kinetic were used to assess clinical significance while differences in HJC position between the Davis and Harrington sets were compared to leg length and subject height using regression analysis. A number of statistically significant differences were present in absolute HJC position. However, all sets fell below the clinically significant thresholds (GPS <1.6°, GDI-Kinetic <3.6 points). Linear regression revealed a statistically significant relationship for both increasing leg length and increasing subject height with decreasing error in anterior/posterior and superior/inferior directions. Results confirm a negligible clinical error for adult subjects suggesting that any of the examined sets could be used interchangeably. Decreasing error with both increasing leg length and increasing subject height suggests that the Davis set should be used cautiously on smaller subjects. Copyright © 2016 Elsevier B.V. All rights reserved.
Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

Science.gov (United States)

Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

2017-01-01

Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Wind speed prediction using statistical regression and neural network

Indian Academy of Sciences (India)

Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve ﬁtting,Auto Regressive ...
A dynamic particle filter-support vector regression method for reliability prediction

International Nuclear Information System (INIS)

Wei, Zhao; Tao, Tao; ZhuoShu, Ding; Zio, Enrico

2013-01-01

Support vector regression (SVR) has been applied to time series prediction and some works have demonstrated the feasibility of its use to forecast system reliability. For accuracy of reliability forecasting, the selection of SVR's parameters is important. The existing research works on SVR's parameters selection divide the example dataset into training and test subsets, and tune the parameters on the training data. However, these fixed parameters can lead to poor prediction capabilities if the data of the test subset differ significantly from those of training. Differently, the novel method proposed in this paper uses particle filtering to estimate the SVR model parameters according to the whole measurement sequence up to the last observation instance. By treating the SVR training model as the observation equation of a particle filter, our method allows updating the SVR model parameters dynamically when a new observation comes. Because of the adaptability of the parameters to dynamic data pattern, the new PF–SVR method has superior prediction performance over that of standard SVR. Four application results show that PF–SVR is more robust than SVR to the decrease of the number of training data and the change of initial SVR parameter values. Also, even if there are trends in the test data different from those in the training data, the method can capture the changes, correct the SVR parameters and obtain good predictions. -- Highlights: •A dynamic PF–SVR method is proposed to predict the system reliability. •The method can adjust the SVR parameters according to the change of data. •The method is robust to the size of training data and initial parameter values. •Some cases based on both artificial and real data are studied. •PF–SVR shows superior prediction performance over standard SVR
Sunspot Cycle Prediction Using Multivariate Regression and Binary ...

Indian Academy of Sciences (India)

49

Multivariate regression model has been derived based on the available cycles 1 .... The flare index correlates well with various parameters of the solar activity. ...... 32) Sabarinath A and Anilkumar A K 2011 A stochastic prediction model for the.
SPE dose prediction using locally weighted regression

International Nuclear Information System (INIS)

Hines, J. W.; Townsend, L. W.; Nichols, T. F.

2005-01-01

When astronauts are outside earth's protective magnetosphere, they are subject to large radiation doses resulting from solar particle events (SPEs). The total dose received from a major SPE in deep space could cause severe radiation poisoning. The dose is usually received over a 20-40 h time interval but the event's effects may be mitigated with an early warning system. This paper presents a method to predict the total dose early in the event. It uses a locally weighted regression model, which is easier to train and provides predictions as accurate as neural network models previously used. (authors)
SPE dose prediction using locally weighted regression

International Nuclear Information System (INIS)

Hines, J. W.; Townsend, L. W.; Nichols, T. F.

2005-01-01

When astronauts are outside Earth's protective magnetosphere, they are subject to large radiation doses resulting from solar particle events. The total dose received from a major solar particle event in deep space could cause severe radiation poisoning. The dose is usually received over a 20-40 h time interval but the event's effects may be reduced with an early warning system. This paper presents a method to predict the total dose early in the event. It uses a locally weighted regression model, which is easier to train, and provides predictions as accurate as the neural network models that were used previously. (authors)
Choosing of mode and calculation of multiple regression equation parameters in X-ray radiometric analysis

International Nuclear Information System (INIS)

Mamikonyan, S.V.; Berezkin, V.V.; Lyubimova, S.V.; Svetajlo, Yu.N.; Shchekin, K.I.

1978-01-01

A method to derive multiple regression equations for X-ray radiometric analysis is described. Te method is realized in the form of the REGRA program in an algorithmic language. The subprograms included in the program are describe. In analyzing cement for Mg, Al, Si, Ca and Fe contents as an example, the obtainment of working equations in the course of calculations by the program is shown to simpliy the realization of computing devices in instruments for X-ray radiometric analysis
Fat-free mass prediction equations for bioelectric impedance analysis compared to dual energy X-ray absorptiometry in obese adolescents: a validation study.

Science.gov (United States)

Hofsteenge, Geesje H; Chinapaw, Mai J M; Weijs, Peter J M

2015-10-15

In clinical practice, patient friendly methods to assess body composition in obese adolescents are needed. Therefore, the bioelectrical impedance analysis (BIA) related fat-free mass (FFM) prediction equations (FFM-BIA) were evaluated in obese adolescents (age 11-18 years) compared to FFM measured by dual-energy x-ray absorptiometry (FFM-DXA) and a new population specific FFM-BIA equation is developed. After an overnight fast, the subjects attended the outpatient clinic. After measuring height and weight, a full body scan by dual-energy x-ray absorptiometry (DXA) and a BIA measurement was performed. Thirteen predictive FFM-BIA equations based on weight, height, age, resistance, reactance and/or impedance were systematically selected and compared to FFM-DXA. Accuracy of FFM-BIA equations was evaluated by the percentage adolescents predicted within 5% of FFM-DXA measured, the mean percentage difference between predicted and measured values (bias) and the Root Mean Squared prediction Error (RMSE). Multiple linear regression was conducted to develop a new BIA equation. Validation was based on 103 adolescents (60% girls), age 14.5 (sd1.7) years, weight 94.1 (sd15.6) kg and FFM-DXA of 56.1 (sd9.8) kg. The percentage accurate estimations varied between equations from 0 to 68%; bias ranged from -29.3 to +36.3% and RMSE ranged from 2.8 to 12.4 kg. An alternative prediction equation was developed: FFM = 0.527 * H(cm)(2)/Imp + 0.306 * weight - 1.862 (R(2) = 0.92, SEE = 2.85 kg). Percentage accurate prediction was 76%. Compared to DXA, the Gray equation underestimated the FFM with 0.4 kg (55.7 ± 8.3), had an RMSE of 3.2 kg, 63% accurate prediction and the smallest bias of (-0.1%). When split by sex, the Gray equation had the narrowest range in accurate predictions, bias, and RMSE. For the assessment of FFM with BIA, the Gray-FFM equation appears to be the most accurate, but 63% is still not at an acceptable accuracy level for obese adolescents. The new equation appears to
Prediction Equations Overestimate the Energy Requirements More for Obesity-Susceptible Individuals.

Science.gov (United States)

McLay-Cooke, Rebecca T; Gray, Andrew R; Jones, Lynnette M; Taylor, Rachael W; Skidmore, Paula M L; Brown, Rachel C

2017-09-13

Predictive equations to estimate resting metabolic rate (RMR) are often used in dietary counseling and by online apps to set energy intake goals for weight loss. It is critical to know whether such equations are appropriate for those susceptible to obesity. We measured RMR by indirect calorimetry after an overnight fast in 26 obesity susceptible (OSI) and 30 obesity resistant (ORI) individuals, identified using a simple 6-item screening tool. Predicted RMR was calculated using the FAO/WHO/UNU (Food and Agricultural Organisation/World Health Organisation/United Nations University), Oxford and Miflin-St Jeor equations. Absolute measured RMR did not differ significantly between OSI versus ORI (6339 vs. 5893 kJ·d -1 , p = 0.313). All three prediction equations over-estimated RMR for both OSI and ORI when measured RMR was ≤5000 kJ·d -1 . For measured RMR ≤7000 kJ·d -1 there was statistically significant evidence that the equations overestimate RMR to a greater extent for those classified as obesity susceptible with biases ranging between around 10% to nearly 30% depending on the equation. The use of prediction equations may overestimate RMR and energy requirements particularly in those who self-identify as being susceptible to obesity, which has implications for effective weight management.
BANK FAILURE PREDICTION WITH LOGISTIC REGRESSION

Directory of Open Access Journals (Sweden)

Taha Zaghdoudi

2013-04-01

Full Text Available In recent years the economic and financial world is shaken by a wave of financial crisis and resulted in violent bank fairly huge losses. Several authors have focused on the study of the crises in order to develop an early warning model. It is in the same path that our work takes its inspiration. Indeed, we have tried to develop a predictive model of Tunisian bank failures with the contribution of the binary logistic regression method. The specificity of our prediction model is that it takes into account microeconomic indicators of bank failures. The results obtained using our provisional model show that a bank's ability to repay its debt, the coefficient of banking operations, bank profitability per employee and leverage financial ratio has a negative impact on the probability of failure.
The physics behind Van der Burgh's empirical equation, providing a new predictive equation for salinity intrusion in estuaries

Science.gov (United States)

Zhang, Zhilin; Savenije, Hubert H. G.

2017-07-01

The practical value of the surprisingly simple Van der Burgh equation in predicting saline water intrusion in alluvial estuaries is well documented, but the physical foundation of the equation is still weak. In this paper we provide a connection between the empirical equation and the theoretical literature, leading to a theoretical range of Van der Burgh's coefficient of 1/2 residual circulation. This type of mixing is relevant in the wider part of alluvial estuaries where preferential ebb and flood channels appear. Subsequently, this dispersion equation is combined with the salt balance equation to obtain a new predictive analytical equation for the longitudinal salinity distribution. Finally, the new equation was tested and applied to a large database of observations in alluvial estuaries, whereby the calibrated K values appeared to correspond well to the theoretical range.
Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

Science.gov (United States)

Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

2015-08-01

Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models

Directory of Open Access Journals (Sweden)

Nataša Šarlija

2017-01-01

Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.

Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

Science.gov (United States)

Madarang, Krish J; Kang, Joo-Hyon

2014-06-01

Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection

Science.gov (United States)

Kumar, Sricharan; Srivistava, Ashok N.

2012-01-01

Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
Retro-regression--another important multivariate regression improvement.

Science.gov (United States)

Randić, M

2001-01-01

We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Development and validation of equations utilizing lamb vision system output to predict lamb carcass fabrication yields.

Science.gov (United States)

Cunha, B C N; Belk, K E; Scanga, J A; LeValley, S B; Tatum, J D; Smith, G C

2004-07-01

This study was performed to validate previous equations and to develop and evaluate new regression equations for predicting lamb carcass fabrication yields using outputs from a lamb vision system-hot carcass component (LVS-HCC) and the lamb vision system-chilled carcass LM imaging component (LVS-CCC). Lamb carcasses (n = 149) were selected after slaughter, imaged hot using the LVS-HCC, and chilled for 24 to 48 h at -3 to 1 degrees C. Chilled carcasses yield grades (YG) were assigned on-line by USDA graders and by expert USDA grading supervisors with unlimited time and access to the carcasses. Before fabrication, carcasses were ribbed between the 12th and 13th ribs and imaged using the LVS-CCC. Carcasses were fabricated into bone-in subprimal/primal cuts. Yields calculated included 1) saleable meat yield (SMY); 2) subprimal yield (SPY); and 3) fat yield (FY). On-line (whole-number) USDA YG accounted for 59, 58, and 64%; expert (whole-number) USDA YG explained 59, 59, and 65%; and expert (nearest-tenth) USDA YG accounted for 60, 60, and 67% of the observed variation in SMY, SPY, and FY, respectively. The best prediction equation developed in this trial using LVS-HCC output and hot carcass weight as independent variables explained 68, 62, and 74% of the variation in SMY, SPY, and FY, respectively. Addition of output from LVS-CCC improved predictive accuracy of the equations; the combined output equations explained 72 and 66% of the variability in SMY and SPY, respectively. Accuracy and repeatability of measurement of LM area made with the LVS-CCC also was assessed, and results suggested that use of LVS-CCC provided reasonably accurate (R2 = 0.59) and highly repeatable (repeatability = 0.98) measurements of LM area. Compared with USDA YG, use of the dual-component lamb vision system to predict cut yields of lamb carcasses improved accuracy and precision, suggesting that this system could have an application as an objective means for pricing carcasses in a value
Empirical equations for the prediction of PGA and pseudo spectral accelerations using Iranian strong-motion data

Science.gov (United States)

Zafarani, H.; Luzi, Lucia; Lanzano, Giovanni; Soghrat, M. R.

2018-01-01

A recently compiled, comprehensive, and good-quality strong-motion database of the Iranian earthquakes has been used to develop local empirical equations for the prediction of peak ground acceleration (PGA) and 5%-damped pseudo-spectral accelerations (PSA) up to 4.0 s. The equations account for style of faulting and four site classes and use the horizontal distance from the surface projection of the rupture plane as a distance measure. The model predicts the geometric mean of horizontal components and the vertical-to-horizontal ratio. A total of 1551 free-field acceleration time histories recorded at distances of up to 200 km from 200 shallow earthquakes (depth < 30 km) with moment magnitudes ranging from Mw 4.0 to 7.3 are used to perform regression analysis using the random effects algorithm of Abrahamson and Youngs (Bull Seism Soc Am 82:505-510, 1992), which considers between-events as well as within-events errors. Due to the limited data used in the development of previous Iranian ground motion prediction equations (GMPEs) and strong trade-offs between different terms of GMPEs, it is likely that the previously determined models might have less precision on their coefficients in comparison to the current study. The richer database of the current study allows improving on prior works by considering additional variables that could not previously be adequately constrained. Here, a functional form used by Boore and Atkinson (Earthquake Spect 24:99-138, 2008) and Bindi et al. (Bull Seism Soc Am 9:1899-1920, 2011) has been adopted that allows accounting for the saturation of ground motions at close distances. A regression has been also performed for the V/H in order to retrieve vertical components by scaling horizontal spectra. In order to take into account epistemic uncertainty, the new model can be used along with other appropriate GMPEs through a logic tree framework for seismic hazard assessment in Iran and Middle East region.
Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

Science.gov (United States)

Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

2013-06-01

This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
Spatial Regression and Prediction of Water Quality in a Watershed with Complex Pollution Sources.

Science.gov (United States)

Yang, Xiaoying; Liu, Qun; Luo, Xingzhang; Zheng, Zheng

2017-08-16

Fast economic development, burgeoning population growth, and rapid urbanization have led to complex pollution sources contributing to water quality deterioration simultaneously in many developing countries including China. This paper explored the use of spatial regression to evaluate the impacts of watershed characteristics on ambient total nitrogen (TN) concentration in a heavily polluted watershed and make predictions across the region. Regression results have confirmed the substantial impact on TN concentration by a variety of point and non-point pollution sources. In addition, spatial regression has yielded better performance than ordinary regression in predicting TN concentrations. Due to its best performance in cross-validation, the river distance based spatial regression model was used to predict TN concentrations across the watershed. The prediction results have revealed a distinct pattern in the spatial distribution of TN concentrations and identified three critical sub-regions in priority for reducing TN loads. Our study results have indicated that spatial regression could potentially serve as an effective tool to facilitate water pollution control in watersheds under diverse physical and socio-economical conditions.
Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

DEFF Research Database (Denmark)

Scheike, Thomas H.; Zhang, Mei-Jie

2002-01-01

Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...
Doses-effect regression equations for some growth indicators of rice plantules from CO60 irradiated seeds

International Nuclear Information System (INIS)

Lopez, R.C.; Gonzalez, L.M.; Garcia, D.

1993-01-01

In the present work, dose-effect regression equations for energy and percentage germination, size, root length and dry mass of plantules from which values of DL-50 middle lethal dose were calculated and likely or unlikely equivalencies among them were established
DNBR Prediction Using a Support Vector Regression

International Nuclear Information System (INIS)

Yang, Heon Young; Na, Man Gyun

2008-01-01

PWRs (Pressurized Water Reactors) generally operate in the nucleate boiling state. However, the conversion of nucleate boiling into film boiling with conspicuously reduced heat transfer induces a boiling crisis that may cause the fuel clad melting in the long run. This type of boiling crisis is called Departure from Nucleate Boiling (DNB) phenomena. Because the prediction of minimum DNBR in a reactor core is very important to prevent the boiling crisis such as clad melting, a lot of research has been conducted to predict DNBR values. The object of this research is to predict minimum DNBR applying support vector regression (SVR) by using the measured signals of a reactor coolant system (RCS). The SVR has extensively and successfully been applied to nonlinear function approximation like the proposed problem for estimating DNBR values that will be a function of various input variables such as reactor power, reactor pressure, core mass flowrate, control rod positions and so on. The minimum DNBR in a reactor core is predicted using these various operating condition data as the inputs to the SVR. The minimum DBNR values predicted by the SVR confirm its correctness compared with COLSS values
Spontaneous regression of retinopathy of prematurity:incidence and predictive factors

Directory of Open Access Journals (Sweden)

Rui-Hong Ju

2013-08-01

Full Text Available AIM:To evaluate the incidence of spontaneous regression of changes in the retina and vitreous in active stage of retinopathy of prematurity(ROP and identify the possible relative factors during the regression.METHODS: This was a retrospective, hospital-based study. The study consisted of 39 premature infants with mild ROP showed spontaneous regression (Group A and 17 with severe ROP who had been treated before naturally involuting (Group B from August 2008 through May 2011. Data on gender, single or multiple pregnancy, gestational age, birth weight, weight gain from birth to the sixth week of life, use of oxygen in mechanical ventilation, total duration of oxygen inhalation, surfactant given or not, need for and times of blood transfusion, 1,5,10-min Apgar score, presence of bacterial or fungal or combined infection, hyaline membrane disease (HMD, patent ductus arteriosus (PDA, duration of stay in the neonatal intensive care unit (NICU and duration of ROP were recorded.RESULTS: The incidence of spontaneous regression of ROP with stage 1 was 86.7%, and with stage 2, stage 3 was 57.1%, 5.9%, respectively. With changes in zone Ⅲ regression was detected 100%, in zoneⅡ 46.2% and in zoneⅠ 0%. The mean duration of ROP in spontaneous regression group was 5.65±3.14 weeks, lower than that of the treated ROP group (7.34±4.33 weeks, but this difference was not statistically significant (P=0.201. GA, 1min Apgar score, 5min Apgar score, duration of NICU stay, postnatal age of initial screening and oxygen therapy longer than 10 days were significant predictive factors for the spontaneous regression of ROP (P＜0.05. Retinal hemorrhage was the only independent predictive factor the spontaneous regression of ROP (OR 0.030, 95%CI 0.001-0.775, P=0.035.CONCLUSION:This study showed most stage 1 and 2 ROP and changes in zone Ⅲ can spontaneously regression in the end. Retinal hemorrhage is weakly inversely associated with the spontaneous regression.
An observation on the variance of a predicted response in ...

African Journals Online (AJOL)

... these properties and computational simplicity. To avoid over fitting, along with the obvious advantage of having a simpler equation, it is shown that the addition of a variable to a regression equation does not reduce the variance of a predicted response. Key words: Linear regression; Partitioned matrix; Predicted response ...
Modeling and prediction of Turkey's electricity consumption using Support Vector Regression

International Nuclear Information System (INIS)

Kavaklioglu, Kadir

2011-01-01

Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)
Real-time prediction of respiratory motion based on local regression methods

International Nuclear Information System (INIS)

Ruan, D; Fessler, J A; Balter, J M

2007-01-01

Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
Hierarchical Neural Regression Models for Customer Churn Prediction

Directory of Open Access Journals (Sweden)

Golshan Mohammadi

2013-01-01

Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
A method for the selection of a functional form for a thermodynamic equation of state using weighted linear least squares stepwise regression

Science.gov (United States)

Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.

1976-01-01

A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Fault trend prediction of device based on support vector regression

International Nuclear Information System (INIS)

Song Meicun; Cai Qi

2011-01-01

The research condition of fault trend prediction and the basic theory of support vector regression (SVR) were introduced. SVR was applied to the fault trend prediction of roller bearing, and compared with other methods (BP neural network, gray model, and gray-AR model). The results show that BP network tends to overlearn and gets into local minimum so that the predictive result is unstable. It also shows that the predictive result of SVR is stabilization, and SVR is superior to BP neural network, gray model and gray-AR model in predictive precision. SVR is a kind of effective method of fault trend prediction. (authors)
Prediction of radiation levels in residences: A methodological comparison of CART [Classification and Regression Tree Analysis] and conventional regression

International Nuclear Information System (INIS)

Janssen, I.; Stebbings, J.H.

1990-01-01

In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

Science.gov (United States)

Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

2015-01-01

Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction

International Nuclear Information System (INIS)

Crop, F; Thierens, H; Rompaye, B Van; Paelinck, L; Vakaet, L; Wagter, C De

2008-01-01

The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry

Predicting Dropouts of University Freshmen: A Logit Regression Analysis.

Science.gov (United States)

Lam, Y. L. Jack

1984-01-01

Stepwise discriminant analysis coupled with logit regression analysis of freshmen data from Brandon University (Manitoba) indicated that six tested variables drawn from research on university dropouts were useful in predicting attrition: student status, residence, financial sources, distance from home town, goal fulfillment, and satisfaction with…
On the estimation and testing of predictive panel regressions

NARCIS (Netherlands)

Karabiyik, H.; Westerlund, Joakim; Narayan, Paresh

2016-01-01

Hjalmarsson (2010) considers an OLS-based estimator of predictive panel regressions that is argued to be mixed normal under very general conditions. In a recent paper, Westerlund et al. (2016) show that while consistent, the estimator is generally not mixed normal, which invalidates standard normal
Predictive equation of state method for heavy materials based on the Dirac equation and density functional theory

Science.gov (United States)

Wills, John M.; Mattsson, Ann E.

2012-02-01

Density functional theory (DFT) provides a formally predictive base for equation of state properties. Available approximations to the exchange/correlation functional provide accurate predictions for many materials in the periodic table. For heavy materials however, DFT calculations, using available functionals, fail to provide quantitative predictions, and often fail to be even qualitative. This deficiency is due both to the lack of the appropriate confinement physics in the exchange/correlation functional and to approximations used to evaluate the underlying equations. In order to assess and develop accurate functionals, it is essential to eliminate all other sources of error. In this talk we describe an efficient first-principles electronic structure method based on the Dirac equation and compare the results obtained with this method with other methods generally used. Implications for high-pressure equation of state of relativistic materials are demonstrated in application to Ce and the light actinides. Sandia National Laboratories is a multi-program laboratory managed andoperated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Directory of Open Access Journals (Sweden)

Minh Vu Trieu

2017-03-01

Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Science.gov (United States)

Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

2017-03-01

This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Multiple regression equations modelling of groundwater of Ajmer-Pushkar railway line region, Rajasthan (India).

Science.gov (United States)

Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra

2010-01-01

In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.
Ridge regression for predicting elastic moduli and hardness of calcium aluminosilicate glasses

Science.gov (United States)

Deng, Yifan; Zeng, Huidan; Jiang, Yejia; Chen, Guorong; Chen, Jianding; Sun, Luyi

2018-03-01

It is of great significance to design glasses with satisfactory mechanical properties predictively through modeling. Among various modeling methods, data-driven modeling is such a reliable approach that can dramatically shorten research duration, cut research cost and accelerate the development of glass materials. In this work, the ridge regression (RR) analysis was used to construct regression models for predicting the compositional dependence of CaO-Al2O3-SiO2 glass elastic moduli (Shear, Bulk, and Young’s moduli) and hardness based on the ternary diagram of the compositions. The property prediction over a large glass composition space was accomplished with known experimental data of various compositions in the literature, and the simulated results are in good agreement with the measured ones. This regression model can serve as a facile and effective tool for studying the relationship between the compositions and the property, enabling high-efficient design of glasses to meet the requirements for specific elasticity and hardness.
Predicting respiratory tumor motion with multi-dimensional adaptive filters and support vector regression

International Nuclear Information System (INIS)

Riaz, Nadeem; Wiersma, Rodney; Mao Weihua; Xing Lei; Shanker, Piyush; Gudmundsson, Olafur; Widrow, Bernard

2009-01-01

Intra-fraction tumor tracking methods can improve radiation delivery during radiotherapy sessions. Image acquisition for tumor tracking and subsequent adjustment of the treatment beam with gating or beam tracking introduces time latency and necessitates predicting the future position of the tumor. This study evaluates the use of multi-dimensional linear adaptive filters and support vector regression to predict the motion of lung tumors tracked at 30 Hz. We expand on the prior work of other groups who have looked at adaptive filters by using a general framework of a multiple-input single-output (MISO) adaptive system that uses multiple correlated signals to predict the motion of a tumor. We compare the performance of these two novel methods to conventional methods like linear regression and single-input, single-output adaptive filters. At 400 ms latency the average root-mean-square-errors (RMSEs) for the 14 treatment sessions studied using no prediction, linear regression, single-output adaptive filter, MISO and support vector regression are 2.58, 1.60, 1.58, 1.71 and 1.26 mm, respectively. At 1 s, the RMSEs are 4.40, 2.61, 3.34, 2.66 and 1.93 mm, respectively. We find that support vector regression most accurately predicts the future tumor position of the methods studied and can provide a RMSE of less than 2 mm at 1 s latency. Also, a multi-dimensional adaptive filter framework provides improved performance over single-dimension adaptive filters. Work is underway to combine these two frameworks to improve performance.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

Directory of Open Access Journals (Sweden)

Eric R. Edelman

2017-06-01

Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

Science.gov (United States)

Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

2017-01-01

For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

Science.gov (United States)

Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

2012-12-01

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Predicting Performance on MOOC Assessments using Multi-Regression Models

OpenAIRE

Ren, Zhiyun; Rangwala, Huzefa; Johri, Aditya

2016-01-01

The past few years has seen the rapid growth of data min- ing approaches for the analysis of data obtained from Mas- sive Open Online Courses (MOOCs). The objectives of this study are to develop approaches to predict the scores a stu- dent may achieve on a given grade-related assessment based on information, considered as prior performance or prior ac- tivity in the course. We develop a personalized linear mul- tiple regression (PLMR) model to predict the grade for a student, prior to attempt...
Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

Science.gov (United States)

Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

2006-01-01

In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
7 CFR 610.12 - Equations for predicting soil loss due to water erosion.

Science.gov (United States)

2010-01-01

.... (a) The equation for predicting soil loss due to erosion for both the USLE and the RUSLE is A = R × K... 22161.) (b) The factors in the USLE equation are: (1) A is the estimation of average annual soil loss in... 7 Agriculture 6 2010-01-01 2010-01-01 false Equations for predicting soil loss due to water...
Trends of Abutment-Scour Prediction Equations Applied to 144 Field Sites in South Carolina

Science.gov (United States)

Benedict, Stephen T.; Deshpande, Nikhil; Aziz, Nadim M.; Conrads, Paul

2006-01-01

The U.S. Geological Survey conducted a study in cooperation with the Federal Highway Administration in which predicted abutment-scour depths computed with selected predictive equations were compared with field measurements of abutment-scour depth made at 144 bridges in South Carolina. The assessment used five equations published in the Fourth Edition of 'Evaluating Scour at Bridges,' (Hydraulic Engineering Circular 18), including the original Froehlich, the modified Froehlich, the Sturm, the Maryland, and the HIRE equations. An additional unpublished equation also was assessed. Comparisons between predicted and observed scour depths are intended to illustrate general trends and order-of-magnitude differences for the prediction equations. Field measurements were taken during non-flood conditions when the hydraulic conditions that caused the scour generally are unknown. The predicted scour depths are based on hydraulic conditions associated with the 100-year flow at all sites and the flood of record for 35 sites. Comparisons showed that predicted scour depths frequently overpredict observed scour and at times were excessive. The comparison also showed that underprediction occurred, but with less frequency. The performance of these equations indicates that they are poor predictors of abutment-scour depth in South Carolina, and it is probable that poor performance will occur when the equations are applied in other geographic regions. Extensive data and graphs used to compare predicted and observed scour depths in this study were compiled into spreadsheets and are included in digital format with this report. In addition to the equation-comparison data, Water-Surface Profile Model tube-velocity data, soil-boring data, and selected abutment-scour data are included in digital format with this report. The digital database was developed as a resource for future researchers and is especially valuable for evaluating the reasonableness of future equations that may be developed.
Limited Sampling Strategy for Accurate Prediction of Pharmacokinetics of Saroglitazar: A 3-point Linear Regression Model Development and Successful Prediction of Human Exposure.

Science.gov (United States)

Joshi, Shuchi N; Srinivas, Nuggehally R; Parmar, Deven V

2018-03-01

Our aim was to develop and validate the extrapolative performance of a regression model using a limited sampling strategy for accurate estimation of the area under the plasma concentration versus time curve for saroglitazar. Healthy subject pharmacokinetic data from a well-powered food-effect study (fasted vs fed treatments; n = 50) was used in this work. The first 25 subjects' serial plasma concentration data up to 72 hours and corresponding AUC 0-t (ie, 72 hours) from the fasting group comprised a training dataset to develop the limited sampling model. The internal datasets for prediction included the remaining 25 subjects from the fasting group and all 50 subjects from the fed condition of the same study. The external datasets included pharmacokinetic data for saroglitazar from previous single-dose clinical studies. Limited sampling models were composed of 1-, 2-, and 3-concentration-time points' correlation with AUC 0-t of saroglitazar. Only models with regression coefficients (R 2 ) >0.90 were screened for further evaluation. The best R 2 model was validated for its utility based on mean prediction error, mean absolute prediction error, and root mean square error. Both correlations between predicted and observed AUC 0-t of saroglitazar and verification of precision and bias using Bland-Altman plot were carried out. None of the evaluated 1- and 2-concentration-time points models achieved R 2 > 0.90. Among the various 3-concentration-time points models, only 4 equations passed the predefined criterion of R 2 > 0.90. Limited sampling models with time points 0.5, 2, and 8 hours (R 2 = 0.9323) and 0.75, 2, and 8 hours (R 2 = 0.9375) were validated. Mean prediction error, mean absolute prediction error, and root mean square error were prediction of saroglitazar. The same models, when applied to the AUC 0-t prediction of saroglitazar sulfoxide, showed mean prediction error, mean absolute prediction error, and root mean square error model predicts the exposure of
Biomass estimates of freshwater zooplankton from length-carbon regression equations

Directory of Open Access Journals (Sweden)

Patrizia COMOLI

2000-02-01

Full Text Available We present length/carbon regression equations of zooplankton species collected from Lake Maggiore (N. Italy during 1992. The results are discussed in terms of the environmental factors, e.g. food availability, predation, controlling biomass production of particle- feeders and predators in the pelagic system of lakes. The marked seasonality in the length-standardized carbon content of Daphnia, and its time-specific trend suggest that from spring onward food availability for Daphnia population may be regarded as a simple decay function. Seasonality does not affect the carbon content/unit length of the two predator Cladocera Leptodora kindtii and Bythotrephes longimanus. Predation is probably the most important regulating factor for the seasonal dynamics of their carbon biomass. The existence of a constant factor to convert the diameter of Conochilus colonies into carbon seems reasonable for an organism whose population comes on quickly and just as quickly disappears.
Shield Optimization and Formulation of Regression Equations for Split-Ring Resonator

Directory of Open Access Journals (Sweden)

Tahir Ejaz

2016-01-01

Full Text Available Microwave resonators are widely used for numerous applications including communication, biomedical and chemical applications, material testing, and food grading. Split-ring resonators in both planar and nonplanar forms are a simple structure which has been in use for several decades. This type of resonator is characterized with low cost, ease of fabrication, moderate quality factor, low external noise interference, high stability, and so forth. Due to these attractive features and ease in handling, nonplanar form of structure has been utilized for material characterization in 1–5 GHz range. Resonant frequency and quality factor are two important parameters for determination of material properties utilizing perturbation theory. Shield made of conducting material is utilized to enclose split-ring resonator which enhances quality factor. This work presents a novel technique to develop shield around a predesigned nonplanar split-ring resonator to yield optimized quality factor. Based on this technique and statistical analysis regression equations have also been formulated for resonant frequency and quality factor which is a major outcome of this work. These equations quantify dependence of output parameters on various factors of shield made of different materials. Such analysis is instrumental in development of devices/designs where improved/optimum result is required.
The current and future use of ridge regression for prediction in quantitative genetics

OpenAIRE

Vlaming, Ronald; Groenen, Patrick

2015-01-01

textabstractIn recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to...
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

Science.gov (United States)

Bersabé, Rosa; Rivas, Teresa

2010-05-01

The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

Science.gov (United States)

de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
A computational approach to compare regression modelling strategies in prediction research.

Science.gov (United States)

Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

2016-08-25

It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Wind Power Ramp Events Prediction with Hybrid Machine Learning Regression Techniques and Reanalysis Data

Directory of Open Access Journals (Sweden)

Laura Cornejo-Bueno

2017-11-01

Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.
Prediction equation of resting energy expenditure in an adult Spanish population of obese adult population.

Science.gov (United States)

de Luis, D A; Aller, R; Izaola, O; Romero, E

2006-01-01

The aim of our study was to evaluate the accuracy of the equations to estimate REE in obese patents and develop a new equation in our obese population. A population of 200 obesity outpatients was analyzed in a prospective way. The following variables were specifically recorded: age, weight, body mass index (BMI), waist circumference, and waist-to-hip ratio. Basal glucose, insulin, and TSH (thyroid-stimulating hormone) were measured. An indirect calorimetry and a tetrapolar electrical bioimpedance were performed. REE measured by indirect calorimetry was compared with REE obtained by prediction equations to obese or nonobese patients. The mean age was 44.8 +/- 16.81 years and the mean BMI 34.4 +/- 5.3. Indirect calorimetry showed that, as compared to women, men had higher resting energy expenditure (REE) (1,998.1 +/- 432 vs. 1,663.9 +/- 349 kcal/day; p consumption (284.6 +/- 67.7 vs. 238.6 +/- 54.3 ml/min; p predicted by prediction equations showed the next data; Berstein's equation (r = 0.65; p prediction equation was REE = 58.6 + (6.1 x weight (kg)) + (1,023.7 x height (m)) - (9.5 x age). The female model was REE = 1,272.5 + (9.8 x weight (kg)) - (61.6 x height (m)) - (8.2 x age). Our prediction equations showed a nonsignificant difference with REE measured (-3.7 kcal/day) with a significant correlation coefficient (r = 0.67; p prediction equations overestimated and underestimated REE measured. WHO equation developed in normal weight individuals provided the closest values. The two new equations (male and female equations) developed in our study had a good accuracy. Copyright 2006 S. Karger AG, Basel.
Logistic regression analysis to predict Medical Licensing Examination of Thailand (MLET) Step1 success or failure.

Science.gov (United States)

Wanvarie, Samkaew; Sathapatayavongs, Boonmee

2007-09-01

The aim of this paper was to assess factors that predict students' performance in the Medical Licensing Examination of Thailand (MLET) Step1 examination. The hypothesis was that demographic factors and academic records would predict the students' performance in the Step1 Licensing Examination. A logistic regression analysis of demographic factors (age, sex and residence) and academic records [high school grade point average (GPA), National University Entrance Examination Score and GPAs of the pre-clinical years] with the MLET Step1 outcome was accomplished using the data of 117 third-year Ramathibodi medical students. Twenty-three (19.7%) students failed the MLET Step1 examination. Stepwise logistic regression analysis showed that the significant predictors of MLET Step1 success/failure were residence background and GPAs of the second and third preclinical years. For students whose sophomore and third-year GPAs increased by an average of 1 point, the odds of passing the MLET Step1 examination increased by a factor of 16.3 and 12.8 respectively. The minimum GPAs for students from urban and rural backgrounds to pass the examination were estimated from the equation (2.35 vs 2.65 from 4.00 scale). Students from rural backgrounds and/or low-grade point averages in their second and third preclinical years of medical school are at risk of failing the MLET Step1 examination. They should be given intensive tutorials during the second and third pre-clinical years.
Prediction of Concrete Mix Cost Using Modified Regression Theory ...

African Journals Online (AJOL)

The cost of concrete production which largely depends on the cost of the constituent materials, affects the overall cost of construction. In this paper, a model based on modified regression theory is formulated to optimise concrete mix cost (in Naira). Using the model, one can predict the cost per cubic meter of concrete if the ...
Predicting logging residues: an interim equation for Appalachian oak sawtimber

Science.gov (United States)

A. Jeff Martin

1975-01-01

An equation, using dbh, dbh², bole length, and sawlog height to predict the cubic-foot volume of logging residue per tree, was developed from data collected on 36 mixed oaks in southwestern Virginia. The equation produced reliable results for small sawtimber trees, but additional research is needed for other species, sites, and utilization practices.
Genome-wide prediction of discrete traits using bayesian regressions and machine learning

Directory of Open Access Journals (Sweden)

Forni Selma

2011-02-01

Full Text Available Abstract Background Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates small n (number of observations problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance. It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO and two machine learning algorithms (boosting and random forest to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. Results The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different
Prediction of the chemical composition and in vitro dry matter ...

African Journals Online (AJOL)

ammoniated maize residue to replace maize meal in fattening diets for ... Optimal feeding is essential for economical animal production. ... linear regression. The wavelengths were incorporated into a prediction equation for each forage quality. The equations were validated by simple linear regression of the laboratory.
Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

Science.gov (United States)

Drzewiecki, Wojciech

2016-12-01

In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Validity of Predictive Equations for Resting Energy Expenditure Developed for Obese Patients: Impact of Body Composition Method

Science.gov (United States)

Achamrah, Najate; Jésus, Pierre; Grigioni, Sébastien; Rimbert, Agnès; Petit, André; Déchelotte, Pierre; Folope, Vanessa; Coëffier, Moïse

2018-01-01

Predictive equations have been specifically developed for obese patients to estimate resting energy expenditure (REE). Body composition (BC) assessment is needed for some of these equations. We assessed the impact of BC methods on the accuracy of specific predictive equations developed in obese patients. REE was measured (mREE) by indirect calorimetry and BC assessed by bioelectrical impedance analysis (BIA) and dual-energy X-ray absorptiometry (DXA). mREE, percentages of prediction accuracy (±10% of mREE) were compared. Predictive equations were studied in 2588 obese patients. Mean mREE was 1788 ± 6.3 kcal/24 h. Only the Müller (BIA) and Harris & Benedict (HB) equations provided REE with no difference from mREE. The Huang, Müller, Horie-Waitzberg, and HB formulas provided a higher accurate prediction (>60% of cases). The use of BIA provided better predictions of REE than DXA for the Huang and Müller equations. Inversely, the Horie-Waitzberg and Lazzer formulas provided a higher accuracy using DXA. Accuracy decreased when applied to patients with BMI ≥ 40, except for the Horie-Waitzberg and Lazzer (DXA) formulas. Müller equations based on BIA provided a marked improvement of REE prediction accuracy than equations not based on BC. The interest of BC to improve REE predictive equations accuracy in obese patients should be confirmed. PMID:29320432
Modeling and prediction of flotation performance using support vector regression

Directory of Open Access Journals (Sweden)

Despotović Vladimir

2017-01-01

Full Text Available Continuous efforts have been made in recent year to improve the process of paper recycling, as it is of critical importance for saving the wood, water and energy resources. Flotation deinking is considered to be one of the key methods for separation of ink particles from the cellulose fibres. Attempts to model the flotation deinking process have often resulted in complex models that are difficult to implement and use. In this paper a model for prediction of flotation performance based on Support Vector Regression (SVR, is presented. Representative data samples were created in laboratory, under a variety of practical control variables for the flotation deinking process, including different reagents, pH values and flotation residence time. Predictive model was created that was trained on these data samples, and the flotation performance was assessed showing that Support Vector Regression is a promising method even when dataset used for training the model is limited.
Saturated properties prediction in critical region by a quartic equation of state

Directory of Open Access Journals (Sweden)

Yong Wang

2011-08-01

Full Text Available A diverse substance library containing extensive PVT data for 77 pure components was used to critically evaluate the performance of a quartic equation of state and other four famous cubic equations of state in critical region. The quartic EOS studied in this work was found to significantly superior to the others in both vapor pressure prediction and saturated volume prediction in vicinity of critical point.
Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

Science.gov (United States)

He, Dan; Kuhn, David; Parida, Laxmi

2016-06-15

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
Empirical models based on the universal soil loss equation fail to predict sediment discharges from Chesapeake Bay catchments.

Science.gov (United States)

Boomer, Kathleen B; Weller, Donald E; Jordan, Thomas E

2008-01-01

The Universal Soil Loss Equation (USLE) and its derivatives are widely used for identifying watersheds with a high potential for degrading stream water quality. We compared sediment yields estimated from regional application of the USLE, the automated revised RUSLE2, and five sediment delivery ratio algorithms to measured annual average sediment delivery in 78 catchments of the Chesapeake Bay watershed. We did the same comparisons for another 23 catchments monitored by the USGS. Predictions exceeded observed sediment yields by more than 100% and were highly correlated with USLE erosion predictions (Pearson r range, 0.73-0.92; p USLE estimates (r = 0.87; p USLE model did not change the results. In ranked comparisons between observed and predicted sediment yields, the models failed to identify catchments with higher yields (r range, -0.28-0.00; p > 0.14). In a multiple regression analysis, soil erodibility, log (stream flow), basin shape (topographic relief ratio), the square-root transformed proportion of forest, and occurrence in the Appalachian Plateau province explained 55% of the observed variance in measured suspended sediment loads, but the model performed poorly (r(2) = 0.06) at predicting loads in the 23 USGS watersheds not used in fitting the model. The use of USLE or multiple regression models to predict sediment yields is not advisable despite their present widespread application. Integrated watershed models based on the USLE may also be unsuitable for making management decisions.
Stochastic Ocean Predictions with Dynamically-Orthogonal Primitive Equations

Science.gov (United States)

Subramani, D. N.; Haley, P., Jr.; Lermusiaux, P. F. J.

2017-12-01

The coastal ocean is a prime example of multiscale nonlinear fluid dynamics. Ocean fields in such regions are complex and intermittent with unstationary heterogeneous statistics. Due to the limited measurements, there are multiple sources of uncertainties, including the initial conditions, boundary conditions, forcing, parameters, and even the model parameterizations and equations themselves. For efficient and rigorous quantification and prediction of these uncertainities, the stochastic Dynamically Orthogonal (DO) PDEs for a primitive equation ocean modeling system with a nonlinear free-surface are derived and numerical schemes for their space-time integration are obtained. Detailed numerical studies with idealized-to-realistic regional ocean dynamics are completed. These include consistency checks for the numerical schemes and comparisons with ensemble realizations. As an illustrative example, we simulate the 4-d multiscale uncertainty in the Middle Atlantic/New York Bight region during the months of Jan to Mar 2017. To provide intitial conditions for the uncertainty subspace, uncertainties in the region were objectively analyzed using historical data. The DO primitive equations were subsequently integrated in space and time. The probability distribution function (pdf) of the ocean fields is compared to in-situ, remote sensing, and opportunity data collected during the coincident POSYDON experiment. Results show that our probabilistic predictions had skill and are 3- to 4- orders of magnitude faster than classic ensemble schemes.
Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

Directory of Open Access Journals (Sweden)

Ivanka Jerić

2011-11-01

Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.
Establishing a Mathematical Equations and Improving the Production of L-tert-Leucine by Uniform Design and Regression Analysis.

Science.gov (United States)

Jiang, Wei; Xu, Chao-Zhen; Jiang, Si-Zhi; Zhang, Tang-Duo; Wang, Shi-Zhen; Fang, Bai-Shan

2017-04-01

L-tert-Leucine (L-Tle) and its derivatives are extensively used as crucial building blocks for chiral auxiliaries, pharmaceutically active ingredients, and ligands. Combining with formate dehydrogenase (FDH) for regenerating the expensive coenzyme NADH, leucine dehydrogenase (LeuDH) is continually used for synthesizing L-Tle from α-keto acid. A multilevel factorial experimental design was executed for research of this system. In this work, an efficient optimization method for improving the productivity of L-Tle was developed. And the mathematical model between different fermentation conditions and L-Tle yield was also determined in the form of the equation by using uniform design and regression analysis. The multivariate regression equation was conveniently implemented in water, with a space time yield of 505.9 g L -1 day -1 and an enantiomeric excess value of >99 %. These results demonstrated that this method might become an ideal protocol for industrial production of chiral compounds and unnatural amino acids such as chiral drug intermediates.
Blood glucose level prediction based on support vector regression using mobile platforms.

Science.gov (United States)

Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M

2016-08-01

The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.
Phase Space Prediction of Chaotic Time Series with Nu-Support Vector Machine Regression

International Nuclear Information System (INIS)

Ye Meiying; Wang Xiaodong

2005-01-01

A new class of support vector machine, nu-support vector machine, is discussed which can handle both classification and regression. We focus on nu-support vector machine regression and use it for phase space prediction of chaotic time series. The effectiveness of the method is demonstrated by applying it to the Henon map. This study also compares nu-support vector machine with back propagation (BP) networks in order to better evaluate the performance of the proposed methods. The experimental results show that the nu-support vector machine regression obtains lower root mean squared error than the BP networks and provides an accurate chaotic time series prediction. These results can be attributable to the fact that nu-support vector machine implements the structural risk minimization principle and this leads to better generalization than the BP networks.

Regression methodology in groundwater composition estimation with composition predictions for Romuvaara borehole KR10

Energy Technology Data Exchange (ETDEWEB)

Luukkonen, A.; Korkealaakso, J.; Pitkaenen, P. [VTT Communities and Infrastructure, Espoo (Finland)

1997-11-01

Teollisuuden Voima Oy selected five investigation areas for preliminary site studies (1987Ae1992). The more detailed site investigation project, launched at the beginning of 1993 and presently supervised by Posiva Oy, is concentrated to three investigation areas. Romuvaara at Kuhmo is one of the present target areas, and the geochemical, structural and hydrological data used in this study are extracted from there. The aim of the study is to develop suitable methods for groundwater composition estimation based on a group of known hydrogeological variables. The input variables used are related to the host type of groundwater, hydrological conditions around the host location, mixing potentials between different types of groundwater, and minerals equilibrated with the groundwater. The output variables are electrical conductivity, Ca, Mg, Mn, Na, K, Fe, Cl, S, HS, SO{sub 4}, alkalinity, {sup 3}H, {sup 14}C, {sup 13}C, Al, Sr, F, Br and I concentrations, and pH of the groundwater. The methodology is to associate the known hydrogeological conditions (i.e. input variables), with the known water compositions (output variables), and to evaluate mathematical relations between these groups. Output estimations are done with two separate procedures: partial least squares regressions on the principal components of input variables, and by training neural networks with input-output pairs. Coefficients of linear equations and trained networks are optional methods for actual predictions. The quality of output predictions are monitored with confidence limit estimations, evaluated from input variable covariances and output variances, and with charge balance calculations. Groundwater compositions in Romuvaara borehole KR10 are predicted at 10 metre intervals with both prediction methods. 46 refs.
Regression Levels of Selected Affective Factors on Science Achievement: A Structural Equation Model with TIMSS 2011 Data

Science.gov (United States)

Akilli, Mustafa

2015-01-01

The aim of this study is to demonstrate the science success regression levels of chosen emotional features of 8th grade students using Structural Equation Model. The study was conducted by the analysis of students' questionnaires and science success in TIMSS 2011 data using SEM. Initially, the factors that are thought to have an effect on science…
A prediction model for spontaneous regression of cervical intraepithelial neoplasia grade 2, based on simple clinical parameters.

Science.gov (United States)

Koeneman, Margot M; van Lint, Freyja H M; van Kuijk, Sander M J; Smits, Luc J M; Kooreman, Loes F S; Kruitwagen, Roy F P M; Kruse, Arnold J

2017-01-01

This study aims to develop a prediction model for spontaneous regression of cervical intraepithelial neoplasia grade 2 (CIN 2) lesions based on simple clinicopathological parameters. The study was conducted at Maastricht University Medical Center, the Netherlands. The prediction model was developed in a retrospective cohort of 129 women with a histologic diagnosis of CIN 2 who were managed by watchful waiting for 6 to 24months. Five potential predictors for spontaneous regression were selected based on the literature and expert opinion and were analyzed in a multivariable logistic regression model, followed by backward stepwise deletion based on the Wald test. The prediction model was internally validated by the bootstrapping method. Discriminative capacity and accuracy were tested by assessing the area under the receiver operating characteristic curve (AUC) and a calibration plot. Disease regression within 24months was seen in 91 (71%) of 129 patients. A prediction model was developed including the following variables: smoking, Papanicolaou test outcome before the CIN 2 diagnosis, concomitant CIN 1 diagnosis in the same biopsy, and more than 1 biopsy containing CIN 2. Not smoking, Papanicolaou class predictive of disease regression. The AUC was 69.2% (95% confidence interval, 58.5%-79.9%), indicating a moderate discriminative ability of the model. The calibration plot indicated good calibration of the predicted probabilities. This prediction model for spontaneous regression of CIN 2 may aid physicians in the personalized management of these lesions. Copyright © 2016 Elsevier Inc. All rights reserved.
Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

Science.gov (United States)

Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

2012-01-01

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
A Method of Calculating Functional Independence Measure at Discharge from Functional Independence Measure Effectiveness Predicted by Multiple Regression Analysis Has a High Degree of Predictive Accuracy.

Science.gov (United States)

Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru

2017-09-01

Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
An adaptive two-stage analog/regression model for probabilistic prediction of small-scale precipitation in France

Science.gov (United States)

Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

2018-01-01

Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

Science.gov (United States)

Rock, N. M. S.; Duffy, T. R.

REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.
Degradation kinetics and assessment of the prediction equation of indigestible fraction of neutral detergent fiber from agroindustrial byproducts

Directory of Open Access Journals (Sweden)

José Gilson Louzada Regadas Filho

2011-09-01

Full Text Available This study aimed at estimating the kinetic parameters of ruminal degradation of neutral detergent fiber from agroindustrial byproducts of cashew (pulp and cashew nut, passion fruit, melon, pineapple, West Indian cherry, grape, annatto and coconut through the gravimetric technique of nylon bag, and to evaluate the prediction equation of indigestible fraction of neutral detergent fiber suggested by the Cornell Net Carbohydrate and Protein System. Samples of feed crushed to 2 mm were placed in 7 × 14 cm nylon bags with porosity of 50 µm in a ratio of 20 g DM/cm² and incubated in duplicate in the rumen of a heifer at 0, 3, 6, 9, 12, 16, 24, 36, 48, 72, 96 and 144 hours. The incubation residues were analyzed for NDF content and evaluated by a non-linear logistic model. The evaluation process of predicting the indigestible fraction of NDF was carried out through adjustment of linear regression models between predicted and observed values. There was a wide variation in the degradation parameters of NDF among byproducts. The degradation rate of NDF ranged from 0.0267 h-1 to 0.0971 h-1 for grape and West Indian cherry, respectively. The potentially digestible fraction of NDF ranged from 4.17 to 90.67%, respectively, for melon and coconut byproducts. The CNCPS equation was sensitive to predict the indigestible fraction of neutral detergent fiber of the byproducts. However, due to the high value of the mean squared error of prediction, such estimates are very variable; hence the most suitable would be estimation by biological methods.
Regression Model to Predict Global Solar Irradiance in Malaysia

Directory of Open Access Journals (Sweden)

Hairuniza Ahmed Kutty

2015-01-01

Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Stochastic Optimal Prediction with Application to Averaged Euler Equations

Energy Technology Data Exchange (ETDEWEB)

Bell, John [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chorin, Alexandre J. [Univ. of California, Berkeley, CA (United States); Crutchfield, William [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2017-04-24

Optimal prediction (OP) methods compensate for a lack of resolution in the numerical solution of complex problems through the use of an invariant measure as a prior measure in the Bayesian sense. In first-order OP, unresolved information is approximated by its conditional expectation with respect to the invariant measure. In higher-order OP, unresolved information is approximated by a stochastic estimator, leading to a system of random or stochastic differential equations. We explain the ideas through a simple example, and then apply them to the solution of Averaged Euler equations in two space dimensions.
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete.

Science.gov (United States)

Pour, Sadaf Moallemi; Alam, M Shahria; Milani, Abbas S

2016-08-30

This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models.
Modified Regression Correlation Coefficient for Poisson Regression Model

Science.gov (United States)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

Science.gov (United States)

Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

2016-03-01

In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Body composition in elderly people: effect of criterion estimates on predictive equations

International Nuclear Information System (INIS)

Baumgartner, R.N.; Heymsfield, S.B.; Lichtman, S.; Wang, J.; Pierson, R.N. Jr.

1991-01-01

The purposes of this study were to determine whether there are significant differences between two- and four-compartment model estimates of body composition, whether these differences are associated with aqueous and mineral fractions of the fat-free mass (FFM); and whether the differences are retained in equations for predicting body composition from anthropometry and bioelectric resistance. Body composition was estimated in 98 men and women aged 65-94 y by using a four-compartment model based on hydrodensitometry, 3 H 2 O dilution, and dual-photon absorptiometry. These estimates were significantly different from those obtained by using Siri's two-compartment model. The differences were associated significantly (P less than 0.0001) with variation in the aqueous fraction of FFM. Equations for predicting body composition from anthropometry and resistance, when calibrated against two-compartment model estimates, retained these systematic errors. Equations predicting body composition in elderly people should be calibrated against estimates from multicompartment models that consider variability in FFM composition
Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

Science.gov (United States)

MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

2005-01-01

Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
An adaptive two-stage analog/regression model for probabilistic prediction of small-scale precipitation in France

Directory of Open Access Journals (Sweden)

J. Chardon

2018-01-01

Full Text Available Statistical downscaling models (SDMs are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

Science.gov (United States)

Kawashima, Issaku; Kumano, Hiroaki

2017-01-01

Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

Directory of Open Access Journals (Sweden)

Issaku Kawashima

2017-07-01

Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii

Science.gov (United States)

Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie

1988-01-01

Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
Stature estimation equations for South Asian skeletons based on DXA scans of contemporary adults.

Science.gov (United States)

Pomeroy, Emma; Mushrif-Tripathy, Veena; Wells, Jonathan C K; Kulkarni, Bharati; Kinra, Sanjay; Stock, Jay T

2018-05-03

Stature estimation from the skeleton is a classic anthropological problem, and recent years have seen the proliferation of population-specific regression equations. Many rely on the anatomical reconstruction of stature from archaeological skeletons to derive regression equations based on long bone lengths, but this requires a collection with very good preservation. In some regions, for example, South Asia, typical environmental conditions preclude the sufficient preservation of skeletal remains. Large-scale epidemiological studies that include medical imaging of the skeleton by techniques such as dual-energy X-ray absorptiometry (DXA) offer new potential datasets for developing such equations. We derived estimation equations based on known height and bone lengths measured from DXA scans from the Andhra Pradesh Children and Parents Study (Hyderabad, India). Given debates on the most appropriate regression model to use, multiple methods were compared, and the performance of the equations was tested on a published skeletal dataset of individuals with known stature. The equations have standard errors of estimates and prediction errors similar to those derived using anatomical reconstruction or from cadaveric datasets. As measured by the number of significant differences between true and estimated stature, and the prediction errors, the new equations perform as well as, and generally better than, published equations commonly used on South Asian skeletons or based on Indian cadaveric datasets. This study demonstrates the utility of DXA scans as a data source for developing stature estimation equations and offer a new set of equations for use with South Asian datasets. © 2018 Wiley Periodicals, Inc.

Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

Science.gov (United States)

Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

2016-09-01

Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS

Directory of Open Access Journals (Sweden)

Н. Білак

2012-04-01

Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.
Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

Science.gov (United States)

Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

2014-12-01

Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
A Matlab program for stepwise regression

Directory of Open Access Journals (Sweden)

Yanhong Qi

2016-03-01

Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

Science.gov (United States)

Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Novel equations to predict body fat percentage of Brazilian professional soccer players: A case study

Directory of Open Access Journals (Sweden)

Luiz Fernando Novack

2014-12-01

Full Text Available This study analyzed classical and developed novel mathematical models to predict body fat percentage (%BF in professional soccer players from the South Brazilian region using skinfold thicknesses measurement. Skinfolds of thirty one male professional soccer players (age of 21.48 ± 3.38 years, body mass of 79.05 ± 9.48 kg and height of 181.97 ± 8.11 cm were introduced into eight mathematical models from the literature for the prediction of %BF; these results were then compared to Dual-energy X-ray Absorptiometry (DXA. The classical equations were able to account from 65% to 79% of the variation of %BF in DXA. Statistical differences between most of the classical equations (seven of the eight classic equations and DXA were found, rendering their widespread use in this population useless. We developed three new equations for prediction of %BF with skinfolds from: axils, abdomen, thighs and calves. Theses equations accounted for 86.5% of the variation in %BF obtained with DXA.
Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

Science.gov (United States)

Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

2013-01-01

Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Predicting lower body power from vertical jump prediction equations for loaded jump squats at different intensities in men and women.

Science.gov (United States)

Wright, Glenn A; Pustina, Andrew A; Mikat, Richard P; Kernozek, Thomas W

2012-03-01

The purpose of this study was to determine the efficacy of estimating peak lower body power from a maximal jump squat using 3 different vertical jump prediction equations. Sixty physically active college students (30 men, 30 women) performed jump squats with a weighted bar's applied load of 20, 40, and 60% of body mass across the shoulders. Each jump squat was simultaneously monitored using a force plate and a contact mat. Peak power (PP) was calculated using vertical ground reaction force from the force plate data. Commonly used equations requiring body mass and vertical jump height to estimate PP were applied such that the system mass (mass of body + applied load) was substituted for body mass. Jump height was determined from flight time as measured with a contact mat during a maximal jump squat. Estimations of PP (PP(est)) for each load and for each prediction equation were compared with criterion PP values from a force plate (PP(FP)). The PP(est) values had high test-retest reliability and were strongly correlated to PP(FP) in both men and women at all relative loads. However, only the Harman equation accurately predicted PP(FP) at all relative loads. It can therefore be concluded that the Harman equation may be used to estimate PP of a loaded jump squat knowing the system mass and peak jump height when more precise (and expensive) measurement equipment is unavailable. Further, high reliability and correlation with criterion values suggest that serial assessment of power production across training periods could be used for relative assessment of change by either of the prediction equations used in this study.
Creep-fatigue life prediction method using Diercks equation for Cr-Mo steel

International Nuclear Information System (INIS)

Sonoya, Keiji; Nonaka, Isamu; Kitagawa, Masaki

1990-01-01

For dealing with the situation that creep-fatigue life properties of materials do not exist, a development of the simple method to predict creep-fatigue life properties is necessary. A method to predict the creep-fatigue life properties of Cr-Mo steels is proposed on the basis of D. Diercks equation which correlates the creep-fatigue lifes of SUS 304 steels under various temperatures, strain ranges, strain rates and hold times. The accuracy of the proposed method was compared with that of the existing methods. The following results were obtained. (1) Fatigue strength and creep rupture strength of Cr-Mo steel are different from those of SUS 304 steel. Therefore in order to apply Diercks equation to creep-fatigue prediction for Cr-Mo steel, the difference of fatigue strength was found to be corrected by fatigue life ratio of both steels and the difference of creep rupture strength was found to be corrected by the equivalent temperature corresponding to equal strength of both steels. (2) Creep-fatigue life can be predicted by the modified Diercks equation within a factor of 2 which is nearly as precise as the accuracy of strain range partitioning method. Required test and analysis procedure of this method are not so complicated as strain range partitioning method. (author)
Intelligent Quality Prediction Using Weighted Least Square Support Vector Regression

Science.gov (United States)

Yu, Yaojun

A novel quality prediction method with mobile time window is proposed for small-batch producing process based on weighted least squares support vector regression (LS-SVR). The design steps and learning algorithm are also addressed. In the method, weighted LS-SVR is taken as the intelligent kernel, with which the small-batch learning is solved well and the nearer sample is set a larger weight, while the farther is set the smaller weight in the history data. A typical machining process of cutting bearing outer race is carried out and the real measured data are used to contrast experiment. The experimental results demonstrate that the prediction accuracy of the weighted LS-SVR based model is only 20%-30% that of the standard LS-SVR based one in the same condition. It provides a better candidate for quality prediction of small-batch producing process.
Application of support vector regression (SVR) for stream flow prediction on the Amazon basin

CSIR Research Space (South Africa)

Du Toit, Melise

2016-10-01

Full Text Available regression technique is used in this study to analyse historical stream flow occurrences and predict stream flow values for the Amazon basin. Up to twelve month predictions are made and the coefficient of determination and root-mean-square error are used...
Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

Directory of Open Access Journals (Sweden)

Chi-Cheng Huang

2013-01-01

Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Regression Models for Predicting Force Coefficients of Aerofoils

Directory of Open Access Journals (Sweden)

Mohammed ABDUL AKBAR

2015-09-01

Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Application of General Regression Neural Network to the Prediction of LOD Change

Science.gov (United States)

Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao

2012-01-01

Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees

Directory of Open Access Journals (Sweden)

Chen Xiaoyu

2007-12-01

Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Soil loss prediction using universal soil loss equation (USLE ...

African Journals Online (AJOL)

Soil loss prediction using universal soil loss equation (USLE) simulation model in a mountainous area in Ag lasun district, Turkey. ... The need for sufficient knowledge and data for decision makers is obvious hence the present study was carried out. The study area, the Alasun district, is in the middle west of Turkey and is ...
Fuzzy Regression Prediction and Application Based on Multi-Dimensional Factors of Freight Volume

Science.gov (United States)

Xiao, Mengting; Li, Cheng

2018-01-01

Based on the reality of the development of air cargo, the multi-dimensional fuzzy regression method is used to determine the influencing factors, and the three most important influencing factors of GDP, total fixed assets investment and regular flight route mileage are determined. The system’s viewpoints and analogy methods, the use of fuzzy numbers and multiple regression methods to predict the civil aviation cargo volume. In comparison with the 13th Five-Year Plan for China’s Civil Aviation Development (2016-2020), it is proved that this method can effectively improve the accuracy of forecasting and reduce the risk of forecasting. It is proved that this model predicts civil aviation freight volume of the feasibility, has a high practical significance and practical operation.
Prediction of Rowing Ergometer Performance from Functional Anaerobic Power, Strength and Anthropometric Components

Directory of Open Access Journals (Sweden)

Akça Firat

2014-07-01

Full Text Available The aim of this research was to develop different regression models to predict 2000 m rowing ergometer performance with the use of anthropometric, anaerobic and strength variables and to determine how precisely the prediction models constituted by different variables predict performance, when conducted together in the same equation or individually. 38 male collegiate rowers (20.17 ± 1.22 years participated in this study. Anthropometric, strength, 2000 m maximal rowing ergometer and rowing anaerobic power tests were applied. Multiple linear regression procedures were employed in SPSS 16 to constitute five different regression formulas using a different group of variables. The reliability of the regression models was expressed by R2 and the standard error of estimate (SEE. Relationships of all parameters with performance were investigated through Pearson correlation coefficients. The prediction model using a combination of anaerobic, strength and anthropometric variables was found to be the most reliable equation to predict 2000 m rowing ergometer performance (R2 = 0.92, SEE= 3.11 s. Besides, the equation that used rowing anaerobic and strength test results also provided a reliable prediction (R2 = 0.85, SEE= 4.27 s. As a conclusion, it seems clear that physiological determinants which are affected by anaerobic energy pathways should also get involved in the processes and models used for performance prediction and talent identification in rowing.
Model-free prediction and regression a transformation-based approach to inference

CERN Document Server

Politis, Dimitris N

2015-01-01

The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...
Efficient Prediction of Low-Visibility Events at Airports Using Machine-Learning Regression

Science.gov (United States)

Cornejo-Bueno, L.; Casanova-Mateo, C.; Sanz-Justo, J.; Cerro-Prada, E.; Salcedo-Sanz, S.

2017-11-01

We address the prediction of low-visibility events at airports using machine-learning regression. The proposed model successfully forecasts low-visibility events in terms of the runway visual range at the airport, with the use of support-vector regression, neural networks (multi-layer perceptrons and extreme-learning machines) and Gaussian-process algorithms. We assess the performance of these algorithms based on real data collected at the Valladolid airport, Spain. We also propose a study of the atmospheric variables measured at a nearby tower related to low-visibility atmospheric conditions, since they are considered as the inputs of the different regressors. A pre-processing procedure of these input variables with wavelet transforms is also described. The results show that the proposed machine-learning algorithms are able to predict low-visibility events well. The Gaussian process is the best algorithm among those analyzed, obtaining over 98% of the correct classification rate in low-visibility events when the runway visual range is {>}1000 m, and about 80% under this threshold. The performance of all the machine-learning algorithms tested is clearly affected in extreme low-visibility conditions ({algorithm performance in daytime and nighttime conditions, and for different prediction time horizons.

Non-destructive equations to estimate the leaf area of Styrax pohlii and Styrax ferrugineus

Directory of Open Access Journals (Sweden)

MC Souza

Full Text Available We developed linear equations to predict the leaf area (LA of the species Styrax pohlii and Styrax ferrugineus using the width (W and length (L leaf dimensions. For both species the linear regression (Y=α+bX using LA as a dependent variable vs. W × L as an independent variable was more efficient than linear regressions using L, W, L2 and W2 as independent variables. Therefore, the LA of S. pohlii can be estimated with the equation LA=0.582+0.683WL, while the LA of S. ferrugineus follows the equation LA=−0.666+0.704WL.
Regional regression equations for the estimation of selected monthly low-flow duration and frequency statistics at ungaged sites on streams in New Jersey

Science.gov (United States)

Watson, Kara M.; McHugh, Amy R.

2014-01-01

Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is
Validation of resting metabolic rate prediction equations for teenagers

Directory of Open Access Journals (Sweden)

Paulo Henrique Santos da Fonseca

2007-09-01

Full Text Available The resting metabolic rate (RMR can be defi ned as the minimum rate of energy spent and represents the main component of the energetic outlay. The purpose of this study is to validate equations to predict the resting metabolic rate in teenagers (103 individuals, being 51 girls and 52 boys, with age between 10 and 17 years from Florianópolis – SC – Brazil. It was measured: the body weight, body height, skinfolds and obtained the lean and body fat mass through bioimpedance. The nonproteic RMR was measured by Weir’s equation (1949, utilizing AeroSport TEEM-100 gas analyzer. The studied equations were: Harry and Benedict (1919, Schofi eld (1985, WHO/FAO/UNU (1985, Henry and Rees (1991, Molnár et al. (1998, Tverskaya et al. (1998 and Müller et al. (2004. In order to study the cross-validation of the RMR prediction equations and its standard measure (Weir 1949, the following statistics procedure were calculated: Pearson’s correlation (r ≥ 0.70, the “t” test with the signifi cance level of p0.05 in relation to the standard measure, with exception of the equations suggested for Tverskaya et al. (1998, and the two models of Müller et al (2004. Even though there was not a signifi cant difference, only the models considered for Henry and Rees (1991, and Molnár et al. (1995 had gotten constant error variation under 5%. All the equations analyzed in the study in girls had not reached criterion of correlation values of 0.70 with the indirect calorimetry. Analyzing the prediction equations of RMR in boys, all of them had moderate correlation coeffi cients with the indirect calorimetry, however below 0.70. Only the equation developed for Tverskaya et al. (1998 presented differences (p ABSTRACT0,05 em relação à medida padrão (Weir 1949, com exceção das equações sugeridas por Tverskaya et al. (1998 e os dois modelos de Müller et al (2004. Mesmo não havendo diferença signifi cativa, somente os modelos propostos por Henry e Rees (1991
OR25: Validity of predictive equations for resting energy expenditure for overweight older adults with and without diabetes

NARCIS (Netherlands)

Verreijen, A. M.; Garrido, V.; Engberink, M.F.; Memelink, R. G.; Visser, M.; Weijs, P. J.

2017-01-01

Rationale: Predictive equations for resting energy expenditure (REE) are used in the treatment of overweight and obesity, but the validity of these equations in overweight older adults is unknown. This study evaluates which predictive REE equation is the best alternative to indirect calorimetry in
Validity of a population-specific BMR predictive equation for adults from an urban tropical setting.

Science.gov (United States)

Wahrlich, Vivian; Teixeira, Tatiana Miliante; Anjos, Luiz Antonio Dos

2018-02-01

Basal metabolic rate (BMR) is an important physiologic measure in nutrition research. In many instances it is not measured but estimated by predictive equations. The purpose of this study was to compare measured BMR (BMRm) with estimated BMR (BMRe) obtained by different equations. A convenient sample of 148 (89 women) 20-60 year-old subjects from the metropolitan area of Rio de Janeiro, Brazil participated in the study. BMRm values were measured by an indirect calorimeter and predicted by different equations (Schofield, Henry and Rees, Mifflin-St. Jeor and Anjos. All subjects had their body composition and anthropometric variables also measured. Accuracy of the estimations was established by the percentage of BMRe falling within ±10% of BMRm and bias when the 95% CI of the difference of BMRe and BMRm means did not include zero. Mean BMRm values were 4833.5 (SD 583.3) and 6278.8 (SD 724.0) kJ*day -1 for women and men, respectively. BMRe values were both biased and inaccurate except for values predicted by the Anjos equation. BMR overestimation was approximately 20% for the Schofield equation which was higher comparatively to the Henry and Rees (14.5% and 9.6% for women and men, respectively) and the Mifflin-St. Jeor (approximately 14.0%) equations. BMR estimated by the Anjos equation was unbiased (95% CI = -78.1; 96.3 kJ day -1 for women and -282.6; 30.7 kJ*day -1 for men). Population-specific BMR predictive equations yield unbiased and accurate BMR values in adults from an urban tropical setting. Copyright © 2016 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions.

Science.gov (United States)

Rativa, Diego; Fernandes, Bruno J T; Roque, Alexandre

2018-01-01

Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care.
A review of a priori regression models for warfarin maintenance dose prediction.

Directory of Open Access Journals (Sweden)

Ben Francis

Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
A review of a priori regression models for warfarin maintenance dose prediction.

Science.gov (United States)

Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea

2014-01-01

A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

Science.gov (United States)

Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

2017-05-01

The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Generalized multidemensional propagation velocity equations for pool-boiling superconducting windings

International Nuclear Information System (INIS)

Christensen, E.H.; O'Loughlin, J.M.

1984-09-01

Several finite difference, finite element detailed analyses of propagation velocities in up to three dimensions in pool-boiling windings have been conducted for different electromagnetic and cryogenic environments. Likewise, a few full scale simulated winding and magnet tests have measured propagation velocities. These velocity data have been correlated in terms of winding thermophysical parameters. This analysis expresses longitudinal and transverse propagation velocities in the form of power function regression equations for a wide variety of windings and electromagnetic and thermohydraulic environments. The generalized velocity equations are considered applicable to well-ventilated, monolithic conductor windings. These design equations are used piecewise in a gross finite difference mode as functions of field to predict the rate of normal zone growth during quench conditions. A further check of the validity of these predictions is available through total predicted quench durations correlated with actual quench durations of large magnets
Are the general equations to predict BMR applicable to patients with anorexia nervosa?

Science.gov (United States)

Marra, M; Polito, A; De Filippo, E; Cuzzolaro, M; Ciarapica, D; Contaldo, F; Scalfi, L

2002-03-01

To determine whether the general equations to predict basal metabolic rate (BMR) can be reliably applied to female anorectics. Two hundred and thirty-seven female patients with anorexia nervosa (AN) were divided into an adolescent group [n=43, 13-17 yrs, 39.3+/-5.0 kg, body mass index (BMI) (weight/height) 15.5+/-1.8 kg/m2] and a young-adult group (n=194, 18-40 yrs, 40.5+/-6.1 kg, BMI 15.6+/-1.9 kg/m2). BMR values determined by indirect calorimetry were compared with those predicted according to either the WHO/FAO/UNU or the Harris-Benedict general equations, or using the Schebendach correction formula (proposed for adjusting the Harris-Benedict estimates in anorectics). Measured BMR was 3,658+/-665 kJ/day in the adolescent and 3,907+/-760 kJ/day in the young-adult patients. In the adolescent group, the differences between predicted and measured values were (mean+/-SD) 1,466 529 kJ/day (+44+/-21%) for WHO/FAO/UNU, 1,587+/-552 kJ/day (+47+/-23%) for the Harris-Benedict and -20+/-510 kJ/day for the Schebendach (+1+/-13%), while in the young-adult group the corresponding values were 696+/-570 kJ/day (+24+/-24%), 1,252+/-644 kJ/day (+37+/-27%) and -430+/-640 kJ/day (-9+/-16%). The bias was negatively associated with weight and BMI in both groups when using the WHO/FAO/UNU and Harris-Benedict equations, and with age in the young-adult group for the Harris-Benedict and Schebendach equations. The WHO/FAO/UNU and Harris-Benedict equations greatly overestimate BMR in AN. Accurate estimation is to some extent dependent on individual characteristics such as age, weight or BMI. The Schebendach correction formula accurately predicts BMR in female adolescents, but not in young adult women with AN.
A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction.

Science.gov (United States)

Qiu, Shibin; Lane, Terran

2009-01-01

The cell defense mechanism of RNA interference has applications in gene function analysis and promising potentials in human disease therapy. To effectively silence a target gene, it is desirable to select appropriate initiator siRNA molecules having satisfactory silencing capabilities. Computational prediction for silencing efficacy of siRNAs can assist this screening process before using them in biological experiments. String kernel functions, which operate directly on the string objects representing siRNAs and target mRNAs, have been applied to support vector regression for the prediction and improved accuracy over numerical kernels in multidimensional vector spaces constructed from descriptors of siRNA design rules. To fully utilize information provided by string and numerical data, we propose to unify the two in a kernel feature space by devising a multiple kernel regression framework where a linear combination of the kernels is used. We formulate the multiple kernel learning into a quadratically constrained quadratic programming (QCQP) problem, which although yields global optimal solution, is computationally demanding and requires a commercial solver package. We further propose three heuristics based on the principle of kernel-target alignment and predictive accuracy. Empirical results demonstrate that multiple kernel regression can improve accuracy, decrease model complexity by reducing the number of support vectors, and speed up computational performance dramatically. In addition, multiple kernel regression evaluates the importance of constituent kernels, which for the siRNA efficacy prediction problem, compares the relative significance of the design rules. Finally, we give insights into the multiple kernel regression mechanism and point out possible extensions.
A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

Science.gov (United States)

Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald

2012-01-01

Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

Directory of Open Access Journals (Sweden)

Kritski Afrânio

2006-02-01

Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Validation of predictive equations for glomerular filtration rate in the Saudi population

Directory of Open Access Journals (Sweden)

Al Wakeel Jamal

2009-01-01

Full Text Available Predictive equations provide a rapid method of assessing glomerular filtration rate (GFR. To compare the various predictive equations for the measurement of this parameter in the Saudi population, we measured GFR by the Modification of Diet in Renal Disease (MDRD and Cockcroft-Gault formulas, cystatin C, reciprocal of cystatin C, creatinine clearance, reciprocal of creatinine, and inulin clearance in 32 Saudi subjects with different stages of renal disease. We com-pared GFR measured by inulin clearance and the estimated GFR by the equations. The study included 19 males (59.4% and 13 (40.6% females with a mean age of 42.3 ± 15.2 years and weight of 68.6 ± 17.7 kg. The mean serum creatinine was 199 ± 161 μmol/L. The GFR measured by inulin clearance was 50.9 ± 33.5 mL/min, and the estimated by Cockcroft-Gault and by MDRD equations was 56.3 ± 33.3 and 52.8 ± 32.0 mL/min, respectively. The GFR estimated by MDRD revealed the strongest correlation with the measured inulin clearance (r= 0.976, P= 0.0000 followed by the GFR estimated by Cockcroft-Gault, serum cystatin C, and serum creatinine (r= 0.953, P= 0.0000 (r= 0.787, P= 0.0001 (r= -0.678, P= 0.001, respectively. The reciprocal of cystatin C and serum creatinine revealed a correlation coefficient of 0.826 and 0.93, respectively. Cockroft-Gault for-mula overestimated the GFR by 5.40 ± 10.3 mL/min in comparison to the MDRD formula, which exhibited the best correlation with inulin clearance in different genders, age groups, body mass index, renal transplant recipients, chronic kidney disease stages when compared to other GFR predictive equations.
Optimal level of continuous positive airway pressure: auto-adjusting titration versus titration with a predictive equation.

Science.gov (United States)

Choi, Ji Ho; Jun, Young Joon; Oh, Jeong In; Jung, Jong Yoon; Hwang, Gyu Ho; Kwon, Soon Young; Lee, Heung Man; Kim, Tae Hoon; Lee, Sang Hag; Lee, Seung Hoon

2013-05-01

The aims of the present study were twofold. We sought to compare two methods of titrating the level of continuous positive airway pressure (CPAP) - auto-adjusting titration and titration using a predictive equation - with full-night manual titration used as the benchmark. We also investigated the reliability of the two methods in patients with obstructive sleep apnea syndrome (OSAS). Twenty consecutive adult patients with OSAS who had successful, full-night manual and auto-adjusting CPAP titration participated in this study. The titration pressure level was calculated with a previously developed predictive equation based on body mass index and apnea-hypopnea index. The mean titration pressure levels obtained with the manual, auto-adjusting, and predictive equation methods were 9.0 +/- 3.6, 9.4 +/- 3.0, and 8.1 +/- 1.6 cm H2O,respectively. There was a significant difference in the concordance within the range of +/- 2 cm H2O (p = 0.019) between both the auto-adjusting titration and the titration using the predictive equation compared to the full-night manual titration. However, there was no significant difference in the concordance within the range of +/- 1 cm H2O (p > 0.999). When compared to full-night manual titration as the standard method, auto-adjusting titration appears to be more reliable than using a predictive equation for determining the optimal CPAP level in patients with OSAS.
Recursive Algorithm For Linear Regression

Science.gov (United States)

Varanasi, S. V.

1988-01-01

Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

KAUST Repository

Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

2016-01-01

An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN
Lean body mass-based standardized uptake value, derived from a predictive equation, might be misleading in PET studies

International Nuclear Information System (INIS)

Erselcan, Taner; Turgut, Bulent; Dogan, Derya; Ozdemir, Semra

2002-01-01

The standardized uptake value (SUV) has gained recognition in recent years as a semiquantitative evaluation parameter in positron emission tomography (PET) studies. However, there is as yet no consensus on the way in which this index should be determined. One of the confusing factors is the normalisation procedure. Among the proposed anthropometric parameters for normalisation is lean body mass (LBM); LBM has been determined by using a predictive equation in most if not all of the studies. In the present study, we assessed the degree of agreement of various LBM predictive equations with a reference method. Secondly, we evaluated the impact of predicted LBM values on a hypothetical value of 2.5 SUV, normalised to LBM (SUV LBM ), by using various equations. The study population consisted of 153 women, aged 32.3±11.8 years (mean±SD), with a height of 1.61±0.06 m, a weight of 71.1±17.5 kg, a body surface area of 1.77±0.22 m 2 and a body mass index of 27.6±6.9 kg/m 2 . LBM (44.2±6.6 kg) was measured by a dual-energy X-ray absorptiometry (DEXA) method. A total of nine equations from the literature were evaluated, four of them from recent PET studies. Although there was significant correlation between predicted and measured LBM values, 95% limits of agreement determined by the Bland and Altman method showed a wide range of variation in predicted LBM values as compared with DEXA, no matter which predictive equation was used. Moreover, only one predictive equation was not statistically different in the comparison of means (DEXA and predicted LBM values). It was also shown that the predictive equations used in this study yield a wide range of SUV LBM values from 1.78 to 5.16 (29% less or 107% more) for an SUV of 2.5. In conclusion, this study suggests that estimation of LBM by use of a predictive equation may cause substantial error for an individual, and that if LBM is chosen for the SUV normalisation procedure, it should be measured, not predicted. (orig.)
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

Science.gov (United States)

He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

2018-01-01

In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…

Forecasting municipal solid waste generation using prognostic tools and regression analysis.

Science.gov (United States)

Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria

2016-11-01

For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts

DEFF Research Database (Denmark)

Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain

2017-01-01

Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella....... One part of obtained logtransformed cell counts was used for model development and another for model validation. The Ratkowsky square root model and the relative lag time (RLT) model were integrated into the logistic model with delay. Fitted parameter estimates were compared to investigate the effect...
Prediction of kindergarteners' behavior on Metropolitan Readiness Tests from preschool perceptual and perceptual-motor performances: a validation study.

Science.gov (United States)

Belka, D E

1981-06-01

Multiple regression equations were generated to predict cognitive achievement for 40 children (ages 57 to 68 mo.) 1 yr. after administration of a battery of 6 perceptual and perceptual-motor tests to determine if previous results from Toledo could be replicated. Regression equations generated from maximum R2 improvement techniques indicated that performance at prekindergarten is useful for prediction of cognitive performance (total score and total score without the copying subtest on the Metropolitan Readiness Tests) 1 yr. later at the end of kindergarten. The optimal battery included scores on auditory perception, fine perceptual-motor, and gross perceptual-motor tasks. The moderate predictive power of the equations obtained was compared with high predictive power generated in the Toledo study.
Predicting company growth using logistic regression and neural networks

Directory of Open Access Journals (Sweden)

Marijana Zekić-Sušac

2016-12-01

Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
Weighted linear regression using D2H and D2 as the independent variables

Science.gov (United States)

Hans T. Schreuder; Michael S. Williams

1998-01-01

Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Predictions of biochar production and torrefaction performance from sugarcane bagasse using interpolation and regression analysis.

Science.gov (United States)

Chen, Wei-Hsin; Hsu, Hung-Jen; Kumar, Gopalakrishnan; Budzianowski, Wojciech M; Ong, Hwai Chyuan

2017-12-01

This study focuses on the biochar formation and torrefaction performance of sugarcane bagasse, and they are predicted using the bilinear interpolation (BLI), inverse distance weighting (IDW) interpolation, and regression analysis. It is found that the biomass torrefied at 275°C for 60min or at 300°C for 30min or longer is appropriate to produce biochar as alternative fuel to coal with low carbon footprint, but the energy yield from the torrefaction at 300°C is too low. From the biochar yield, enhancement factor of HHV, and energy yield, the results suggest that the three methods are all feasible for predicting the performance, especially for the enhancement factor. The power parameter of unity in the IDW method provides the best predictions and the error is below 5%. The second order in regression analysis gives a more reasonable approach than the first order, and is recommended for the predictions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Derivation of governing equation for predicting thermal conductivity of composites with spherical inclusions and its applications

International Nuclear Information System (INIS)

Lee, Jae-Kon; Kim, Jin-Gon

2011-01-01

A governing differential equation for predicting the effective thermal conductivity of composites with spherical inclusions is shown to be simply derived by using the result of the generalized self-consistent model. By applying the equation to composites including spherical inclusions such as graded spherical inclusions, microballoons, mutiply-coated spheres, and spherical inclusions with an interphase, their effective thermal conductivities are easily predicted. The results are compared with those in the literatures to be consistent. It can be stated from the investigations that the effective thermal conductivity of composites with spherical inclusions can be estimated as long as their conductivities are expressed as a function of their radius. -- Highlights: → We derive equation for predicting the effective thermal conductivity of composites. → The equation is derived using the results of the generalized self-consistent model. → The inclusions are graded sphere, microballoons, and mutiply-coated spheres.
Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

Science.gov (United States)

Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

2018-01-01

Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg
Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

Science.gov (United States)

Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

2018-02-01

1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
A Seemingly Unrelated Poisson Regression Model

OpenAIRE

King, Gary

1989-01-01

This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Support vector regression to predict porosity and permeability: Effect of sample size

Science.gov (United States)

Al-Anazi, A. F.; Gates, I. D.

2012-02-01

Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function
Evaluation of heat transfer mathematical models and multiple linear regression to predict the inside variables in semi-solar greenhouse

Directory of Open Access Journals (Sweden)

M Taki

2017-05-01

. To measure the temperature and the relative humidity of the air, soil and roof inside and outside the greenhouse, the SHT 11 sensors were used. The accuracy of the measurement of temperature was ±0.4% at 20 °C and the precision measurement of the moisture was ±3% for a clear sky. We used these sensors in soil, on the roof (inside greenhouse and in the air of greenhouse and outside to measure the temperature and relative humidity. At a 1 m height above the ground outside the greenhouse, we used a pyranometre type TES 1333. Its sensitivity was proportional to the cosine of the incidence angle of the radiation. It is a measure of global radiation of the spectral band solar in the 400–1110 nm. Its measurement accuracy was approximately ±5%. Some heat transfer models used to predict the inside and roof temperature are according to equation (1 and (5: Results and Discussion Results showed that solar radiation on the roof of semi-solar greenhouse was higher after noon so this shape can receive high amounts of solar energy during a day. From statistical point of view, both desired and predicted test data have been analyzed to determine whether there are statistically significant differences between them. The null hypothesis assumes that statistical parameters of both series are equal. P value was used to check each hypothesis. Its threshold value was 0.05. If p value is greater than the threshold, the null hypothesis is then fulfilled. To check the differences between the data series, different tests were performed and p value was calculated for each case. The so called t-test was used to compare the means of both series. It was also assumed that the variance of both samples could be considered equal. The variance was analyzed using the F-test. Here, a normal distribution of samples was assumed. The results showed that the p values for heat model in all 2 statistical factors (Comparison of means, and variance is lower than regression model and so the heat model did not
Cell membrane temperature rate sensitivity predicted from the Nernst equation.

Science.gov (United States)

Barnes, F S

1984-01-01

A hyperpolarized current is predicted from the Nernst equation for conditions of positive temperature derivatives with respect to time. This ion current, coupled with changes in membrane channel conductivities, is expected to contribute to a transient potential shift across the cell membrane for silent cells and to a change in firing rate for pacemaker cells.
Common y-intercept and single compound regressions of gas-particle partitioning data vs 1/T

Science.gov (United States)

Pankow, James F.

Confidence intervals are placed around the log Kp vs 1/ T correlation equations obtained using simple linear regressions (SLR) with the gas-particle partitioning data set of Yamasaki et al. [(1982) Env. Sci. Technol.16, 189-194]. The compounds and groups of compounds studied include the polycylic aromatic hydrocarbons phenanthrene + anthracene, me-phenanthrene + me-anthracene, fluoranthene, pyrene, benzo[ a]fluorene + benzo[ b]fluorene, chrysene + benz[ a]anthracene + triphenylene, benzo[ b]fluoranthene + benzo[ k]fluoranthene, and benzo[ a]pyrene + benzo[ e]pyrene (note: me = methyl). For any given compound, at equilibrium, the partition coefficient Kp equals ( F/ TSP)/ A where F is the particulate-matter associated concentration (ng m -3), A is the gas-phase concentration (ng m -3), and TSP is the concentration of particulate matter (μg m -3). At temperatures more than 10°C from the mean sampling temperature of 17°C, the confidence intervals are quite wide. Since theory predicts that similar compounds sorbing on the same particulate matter should possess very similar y-intercepts, the data set was also fitted using a special common y-intercept regression (CYIR). For most of the compounds, the CYIR equations fell inside of the SLR 95% confidence intervals. The CYIR y-intercept value is -18.48, and is reasonably close to the type of value that can be predicted for PAH compounds. The set of CYIR regression equations is probably more reliable than the set of SLR equations. For example, the CYIR-derived desorption enthalpies are much more highly correlated with vaporization enthalpies than are the SLR-derived desorption enthalpies. It is recommended that the CYIR approach be considered whenever analysing temperature-dependent gas-particle partitioning data.
Validity of resting energy expenditure predictive equations before and after an energy-restricted diet intervention in obese women.

Directory of Open Access Journals (Sweden)

Jonatan R Ruiz

Full Text Available BACKGROUND: We investigated the validity of REE predictive equations before and after 12-week energy-restricted diet intervention in Spanish obese (30 kg/m(2>BMI<40 kg/m(2 women. METHODS: We measured REE (indirect calorimetry, body weight, height, and fat mass (FM and fat free mass (FFM, dual X-ray absorptiometry in 86 obese Caucasian premenopausal women aged 36.7±7.2 y, before and after (n = 78 women the intervention. We investigated the accuracy of ten REE predictive equations using weight, height, age, FFM and FM. RESULTS: At baseline, the most accurate equation was the Mifflin et al. (Am J Clin Nutr 1990; 51: 241-247 when using weight (bias:-0.2%, P = 0.982, 74% of accurate predictions. This level of accuracy was not reached after the diet intervention (24% accurate prediction. After the intervention, the lowest bias was found with the Owen et al. (Am J Clin Nutr 1986; 44: 1-19 equation when using weight (bias:-1.7%, P = 0.044, 81% accurate prediction, yet it provided 53% accurate predictions at baseline. CONCLUSIONS: There is a wide variation in the accuracy of REE predictive equations before and after weight loss in non-morbid obese women. The results acquire especial relevance in the context of the challenging weight regain phenomenon for the overweight/obese population.
Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

Science.gov (United States)

Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

2016-09-01

The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Estimation of Resting Energy Expenditure: Validation of Previous and New Predictive Equations in Obese Children and Adolescents.

Science.gov (United States)

Acar-Tek, Nilüfer; Ağagündüz, Duygu; Çelik, Bülent; Bozbulut, Rukiye

2017-08-01

Accurate estimation of resting energy expenditure (REE) in childrenand adolescents is important to establish estimated energy requirements. The aim of the present study was to measure REE in obese children and adolescents by indirect calorimetry method, compare these values with REE values estimated by equations, and develop the most appropriate equation for this group. One hundred and three obese children and adolescents (57 males, 46 females) between 7 and 17 years (10.6 ± 2.19 years) were recruited for the study. REE measurements of subjects were made with indirect calorimetry (COSMED, FitMatePro, Rome, Italy) and body compositions were analyzed. In females, the percentage of accurate prediction varied from 32.6 (World Health Organization [WHO]) to 43.5 (Molnar and Lazzer). The bias for equations was -0.2% (Kim), 3.7% (Molnar), and 22.6% (Derumeaux-Burel). Kim's (266 kcal/d), Schmelzle's (267 kcal/d), and Henry's equations (268 kcal/d) had the lowest root mean square error (RMSE; respectively 266, 267, 268 kcal/d). The equation that has the highest RMSE values among female subjects was the Derumeaux-Burel equation (394 kcal/d). In males, when the Institute of Medicine (IOM) had the lowest accurate prediction value (12.3%), the highest values were found using Schmelzle's (42.1%), Henry's (43.9%), and Müller's equations (fat-free mass, FFM; 45.6%). When Kim and Müller had the smallest bias (-0.6%, 9.9%), Schmelzle's equation had the smallest RMSE (331 kcal/d). The new specific equation based on FFM was generated as follows: REE = 451.722 + (23.202 * FFM). According to Bland-Altman plots, it has been found out that the new equations are distributed randomly in both males and females. Previously developed predictive equations mostly provided unaccurate and biased estimates of REE. However, the new predictive equations allow clinicians to estimate REE in an obese children and adolescents with sufficient and acceptable accuracy.
Dose-response regressions for algal growth and similar continuous endpoints: Calculation of effective concentrations

DEFF Research Database (Denmark)

Christensen, Erik R.; Kusk, Kresten Ole; Nyholm, Niels

2009-01-01

We derive equations for the effective concentration giving 10% inhibition (EC10) with 95% confidence limits for probit (log-normal), Weibull, and logistic dose -responsemodels on the basis of experimentally derived median effective concentrations (EC50s) and the curve slope at the central point (50......% inhibition). For illustration, data from closed, freshwater algal assays are analyzed using the green alga Pseudokirchneriella subcapitata with growth rate as the response parameter. Dose-response regressions for four test chemicals (tetraethylammonium bromide, musculamine, benzonitrile, and 4...... regression program with variance weighting and proper inverse estimation. The Weibull model provides the best fit to the data for all four chemicals. Predicted EC10s (95% confidence limits) from our derived equations are quite accurate; for example, with 4-4-(trifluoromethyl)phenoxy-phenol and the probit...
Estimation of evapotranspiration across the conterminous United States using a regression with climate and land-cover data

Science.gov (United States)

Sanford, Ward E.; Selnick, David L.

2013-01-01

Evapotranspiration (ET) is an important quantity for water resource managers to know because it often represents the largest sink for precipitation (P) arriving at the land surface. In order to estimate actual ET across the conterminous United States (U.S.) in this study, a water-balance method was combined with a climate and land-cover regression equation. Precipitation and streamflow records were compiled for 838 watersheds for 1971-2000 across the U.S. to obtain long-term estimates of actual ET. A regression equation was developed that related the ratio ET/P to climate and land-cover variables within those watersheds. Precipitation and temperatures were used from the PRISM climate dataset, and land-cover data were used from the USGS National Land Cover Dataset. Results indicate that ET can be predicted relatively well at a watershed or county scale with readily available climate variables alone, and that land-cover data can also improve those predictions. Using the climate and land-cover data at an 800-m scale and then averaging to the county scale, maps were produced showing estimates of ET and ET/P for the entire conterminous U.S. Using the regression equation, such maps could also be made for more detailed state coverages, or for other areas of the world where climate and land-cover data are plentiful.
A Gaussian mixture copula model based localized Gaussian process regression approach for long-term wind speed prediction

International Nuclear Information System (INIS)

Yu, Jie; Chen, Kuilin; Mori, Junichi; Rashid, Mudassir M.

2013-01-01

Optimizing wind power generation and controlling the operation of wind turbines to efficiently harness the renewable wind energy is a challenging task due to the intermittency and unpredictable nature of wind speed, which has significant influence on wind power production. A new approach for long-term wind speed forecasting is developed in this study by integrating GMCM (Gaussian mixture copula model) and localized GPR (Gaussian process regression). The time series of wind speed is first classified into multiple non-Gaussian components through the Gaussian mixture copula model and then Bayesian inference strategy is employed to incorporate the various non-Gaussian components using the posterior probabilities. Further, the localized Gaussian process regression models corresponding to different non-Gaussian components are built to characterize the stochastic uncertainty and non-stationary seasonality of the wind speed data. The various localized GPR models are integrated through the posterior probabilities as the weightings so that a global predictive model is developed for the prediction of wind speed. The proposed GMCM–GPR approach is demonstrated using wind speed data from various wind farm locations and compared against the GMCM-based ARIMA (auto-regressive integrated moving average) and SVR (support vector regression) methods. In contrast to GMCM–ARIMA and GMCM–SVR methods, the proposed GMCM–GPR model is able to well characterize the multi-seasonality and uncertainty of wind speed series for accurate long-term prediction. - Highlights: • A novel predictive modeling method is proposed for long-term wind speed forecasting. • Gaussian mixture copula model is estimated to characterize the multi-seasonality. • Localized Gaussian process regression models can deal with the random uncertainty. • Multiple GPR models are integrated through Bayesian inference strategy. • The proposed approach shows higher prediction accuracy and reliability

Groundwater level prediction of landslide based on classification and regression tree

Directory of Open Access Journals (Sweden)

Yannan Zhao

2016-09-01

Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
Predictive equations for lung volumes from computed tomography for size matching in pulmonary transplantation.

Science.gov (United States)

Konheim, Jeremy A; Kon, Zachary N; Pasrija, Chetan; Luo, Qingyang; Sanchez, Pablo G; Garcia, Jose P; Griffith, Bartley P; Jeudy, Jean

2016-04-01

Size matching for lung transplantation is widely accomplished using height comparisons between donors and recipients. This gross approximation allows for wide variation in lung size and, potentially, size mismatch. Three-dimensional computed tomography (3D-CT) volumetry comparisons could offer more accurate size matching. Although recipient CT scans are universally available, donor CT scans are rarely performed. Therefore, predicted donor lung volumes could be used for comparison to measured recipient lung volumes, but no such predictive equations exist. We aimed to use 3D-CT volumetry measurements from a normal patient population to generate equations for predicted total lung volume (pTLV), predicted right lung volume (pRLV), and predicted left lung volume (pLLV), for size-matching purposes. Chest CT scans of 400 normal patients were retrospectively evaluated. 3D-CT volumetry was performed to measure total lung volume, right lung volume, and left lung volume of each patient, and predictive equations were generated. The fitted model was tested in a separate group of 100 patients. The model was externally validated by comparison of total lung volume with total lung capacity from pulmonary function tests in a subset of those patients. Age, gender, height, and race were independent predictors of lung volume. In the test group, there were strong linear correlations between predicted and actual lung volumes measured by 3D-CT volumetry for pTLV (r = 0.72), pRLV (r = 0.72), and pLLV (r = 0.69). A strong linear correlation was also observed when comparing pTLV and total lung capacity (r = 0.82). We successfully created a predictive model for pTLV, pRLV, and pLLV. These may serve as reference standards and predict donor lung volume for size matching in lung transplantation. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
Study (Prediction of Main Pipes Break Rates in Water Distribution Systems Using Intelligent and Regression Methods

Directory of Open Access Journals (Sweden)

Massoud Tabesh

2011-07-01

Full Text Available Optimum operation of water distribution networks is one of the priorities of sustainable development of water resources, considering the issues of increasing efficiency and decreasing the water losses. One of the key subjects in optimum operational management of water distribution systems is preparing rehabilitation and replacement schemes, prediction of pipes break rate and evaluation of their reliability. Several approaches have been presented in recent years regarding prediction of pipe failure rates which each one requires especial data sets. Deterministic models based on age and deterministic multi variables and stochastic group modeling are examples of the solutions which relate pipe break rates to parameters like age, material and diameters. In this paper besides the mentioned parameters, more factors such as pipe depth and hydraulic pressures are considered as well. Then using multi variable regression method, intelligent approaches (Artificial neural network and neuro fuzzy models and Evolutionary polynomial Regression method (EPR pipe burst rate are predicted. To evaluate the results of different approaches, a case study is carried out in a part ofMashhadwater distribution network. The results show the capability and advantages of ANN and EPR methods to predict pipe break rates, in comparison with neuro fuzzy and multi-variable regression methods.
ENHANCED PREDICTION OF STUDENT DROPOUTS USING FUZZY INFERENCE SYSTEM AND LOGISTIC REGRESSION

Directory of Open Access Journals (Sweden)

A. Saranya

2016-01-01

Full Text Available Predicting college and school dropouts is a major problem in educational system and has complicated challenge due to data imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout students using neural based classification algorithm and different mining technique are implemented for data processing. We also propose a Dropout Prediction Algorithm (DPA using fuzzy logic and Logistic Regression based inference system because the weighted average will improve the performance of whole system. We are experimented our proposed work with all other classification systems and documented as the best outcomes. The aggregated data is given to the decision trees for better dropout prediction. The accuracy of overall system 98.6% it shows the proposed work depicts efficient prediction.
Fast integration-based prediction bands for ordinary differential equation models.

Science.gov (United States)

Hass, Helge; Kreutz, Clemens; Timmer, Jens; Kaschek, Daniel

2016-04-15

To gain a deeper understanding of biological processes and their relevance in disease, mathematical models are built upon experimental data. Uncertainty in the data leads to uncertainties of the model's parameters and in turn to uncertainties of predictions. Mechanistic dynamic models of biochemical networks are frequently based on nonlinear differential equation systems and feature a large number of parameters, sparse observations of the model components and lack of information in the available data. Due to the curse of dimensionality, classical and sampling approaches propagating parameter uncertainties to predictions are hardly feasible and insufficient. However, for experimental design and to discriminate between competing models, prediction and confidence bands are essential. To circumvent the hurdles of the former methods, an approach to calculate a profile likelihood on arbitrary observations for a specific time point has been introduced, which provides accurate confidence and prediction intervals for nonlinear models and is computationally feasible for high-dimensional models. In this article, reliable and smooth point-wise prediction and confidence bands to assess the model's uncertainty on the whole time-course are achieved via explicit integration with elaborate correction mechanisms. The corresponding system of ordinary differential equations is derived and tested on three established models for cellular signalling. An efficiency analysis is performed to illustrate the computational benefit compared with repeated profile likelihood calculations at multiple time points. The integration framework and the examples used in this article are provided with the software package Data2Dynamics, which is based on MATLAB and freely available at http://www.data2dynamics.org helge.hass@fdm.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e
Prediction of hourly PM2.5 using a space-time support vector regression model

Science.gov (United States)

Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang

2018-05-01

Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Predicting Jakarta composite index using hybrid of fuzzy time series and support vector regression models

Science.gov (United States)

Febrian Umbara, Rian; Tarwidi, Dede; Budi Setiawan, Erwin

2018-03-01

The paper discusses the prediction of Jakarta Composite Index (JCI) in Indonesia Stock Exchange. The study is based on JCI historical data for 1286 days to predict the value of JCI one day ahead. This paper proposes predictions done in two stages., The first stage using Fuzzy Time Series (FTS) to predict values of ten technical indicators, and the second stage using Support Vector Regression (SVR) to predict the value of JCI one day ahead, resulting in a hybrid prediction model FTS-SVR. The performance of this combined prediction model is compared with the performance of the single stage prediction model using SVR only. Ten technical indicators are used as input for each model.
Biostatistics Series Module 6: Correlation and Linear Regression.

Science.gov (United States)

Hazra, Avijit; Gogtay, Nithya

2016-01-01

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

Science.gov (United States)

Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

2015-01-01

Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (pmachine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

Science.gov (United States)

Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

2016-07-01

In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

Directory of Open Access Journals (Sweden)

Fatimah Khaleel Ibrahim

2017-08-01

Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.
Geoelectrical parameter-based multivariate regression borehole yield model for predicting aquifer yield in managing groundwater resource sustainability

Directory of Open Access Journals (Sweden)

Kehinde Anthony Mogaji

2016-07-01

Full Text Available This study developed a GIS-based multivariate regression (MVR yield rate prediction model of groundwater resource sustainability in the hard-rock geology terrain of southwestern Nigeria. This model can economically manage the aquifer yield rate potential predictions that are often overlooked in groundwater resources development. The proposed model relates the borehole yield rate inventory of the area to geoelectrically derived parameters. Three sets of borehole yield rate conditioning geoelectrically derived parameters—aquifer unit resistivity (ρ, aquifer unit thickness (D and coefficient of anisotropy (λ—were determined from the acquired and interpreted geophysical data. The extracted borehole yield rate values and the geoelectrically derived parameter values were regressed to develop the MVR relationship model by applying linear regression and GIS techniques. The sensitivity analysis results of the MVR model evaluated at P ⩽ 0.05 for the predictors ρ, D and λ provided values of 2.68 × 10−05, 2 × 10−02 and 2.09 × 10−06, respectively. The accuracy and predictive power tests conducted on the MVR model using the Theil inequality coefficient measurement approach, coupled with the sensitivity analysis results, confirmed the model yield rate estimation and prediction capability. The MVR borehole yield prediction model estimates were processed in a GIS environment to model an aquifer yield potential prediction map of the area. The information on the prediction map can serve as a scientific basis for predicting aquifer yield potential rates relevant in groundwater resources sustainability management. The developed MVR borehole yield rate prediction mode provides a good alternative to other methods used for this purpose.
Tooth width predictions in a sample of Black South Africans.

Science.gov (United States)

Khan, M I; Seedat, A K; Hlongwa, P

2007-07-01

Space analysis during the mixed dentition requires prediction of the mesiodistal widths of the unerupted permanent canines and premolars and prediction tables and equations may be used for this purpose. The Tanaka and Johnston prediction equations, which were derived from a North American White sample, is one example which is widely used. This prediction equation may be inapplicable to other race groups due to racial tooth size variability. Therefore the purpose of this study was to derive prediction equations that would be applicable to Black South African subjects. One hundred and ten pre-treatment study casts of Black South African subjects were analysed from the Department of Orthodontics' records at the University of Limpopo. The sample was equally divided by gender with all subjects having Class I molar relationship and relatively well aligned teeth. The mesiodistal widths of the maxillary and mandibular canines and premolars were measured with a digital vernier calliper and compared with the measurements predicted with the Tanaka and Johnston equations. The relationship between the measured and predicted values were analysed by correlation and regression analyses. The results indicated that the Tanaka and Johnston prediction equations were not fully applicable to the Black South African sample. The equations tended to underpredict the male sample, while slight overprediction was observed in the female sample. Therefore, new equations were formulated and proposed that would be accurate for Black subjects.
Using support vector regression to predict PM10 and PM2.5

International Nuclear Information System (INIS)

Weizhen, Hou; Zhengqiang, Li; Yuhuan, Zhang; Hua, Xu; Ying, Zhang; Kaitao, Li; Donghui, Li; Peng, Wei; Yan, Ma

2014-01-01

Support vector machine (SVM), as a novel and powerful machine learning tool, can be used for the prediction of PM 10 and PM 2.5 (particulate matter less or equal than 10 and 2.5 micrometer) in the atmosphere. This paper describes the development of a successive over relaxation support vector regress (SOR-SVR) model for the PM 10 and PM 2.5 prediction, based on the daily average aerosol optical depth (AOD) and meteorological parameters (atmospheric pressure, relative humidity, air temperature, wind speed), which were all measured in Beijing during the year of 2010–2012. The Gaussian kernel function, as well as the k-fold crosses validation and grid search method, are used in SVR model to obtain the optimal parameters to get a better generalization capability. The result shows that predicted values by the SOR-SVR model agree well with the actual data and have a good generalization ability to predict PM 10 and PM 2.5 . In addition, AOD plays an important role in predicting particulate matter with SVR model, which should be included in the prediction model. If only considering the meteorological parameters and eliminating AOD from the SVR model, the prediction results of predict particulate matter will be not satisfying
Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine.

Science.gov (United States)

Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng

2013-08-01

Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimating the Accuracy of the Chedoke-McMaster Stroke Assessment Predictive Equations for Stroke Rehabilitation.

Science.gov (United States)

Dang, Mia; Ramsaran, Kalinda D; Street, Melissa E; Syed, S Noreen; Barclay-Goddard, Ruth; Stratford, Paul W; Miller, Patricia A

2011-01-01

To estimate the predictive accuracy and clinical usefulness of the Chedoke-McMaster Stroke Assessment (CMSA) predictive equations. A longitudinal prognostic study using historical data obtained from 104 patients admitted post cerebrovascular accident was undertaken. Data were abstracted for all patients undergoing rehabilitation post stroke who also had documented admission and discharge CMSA scores. Published predictive equations were used to determine predicted outcomes. To determine the accuracy and clinical usefulness of the predictive model, shrinkage coefficients and predictions with 95% confidence bands were calculated. Complete data were available for 74 patients with a mean age of 65.3±12.4 years. The shrinkage values for the six Impairment Inventory (II) dimensions varied from -0.05 to 0.09; the shrinkage value for the Activity Inventory (AI) was 0.21. The error associated with predictive values was greater than ±1.5 stages for the II dimensions and greater than ±24 points for the AI. This study shows that the large error associated with the predictions (as defined by the confidence band) for the CMSA II and AI limits their clinical usefulness as a predictive measure. Further research to establish predictive models using alternative statistical procedures is warranted.
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).

Science.gov (United States)

Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S

2017-01-01

This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
Evaluation of Equations for Predicting 24-Hour Urinary Sodium Excretion from Casual Urine Samples in Asian Adults.

Science.gov (United States)

Whitton, Clare; Gay, Gibson Ming Wei; Lim, Raymond Boon Tar; Tan, Linda Wei Lin; Lim, Wei-Yen; van Dam, Rob M

2016-08-01

The collection of 24-h urine samples for the estimation of sodium intake is burdensome, and the utility of spot urine samples in Southeast Asian populations is unclear. We aimed to assess the validity of prediction equations with the use of spot urine concentrations. A sample of 144 Singapore residents of Chinese, Malay, and Indian ethnicity aged 18-79 y were recruited from the Singapore Health 2 Study conducted in 2014. Participants collected urine for 24 h in multiple small bottles on a single day. To determine the optimal collection time for a spot urine sample, a 1-mL sample was taken from a random bottle collected in the morning, afternoon, and evening. Published equations and a newly derived equation were used to predict 24-h sodium excretion from spot urine samples. The mean ± SD concentration of sodium from the 24-h urine sample was 125 ± 53.4 mmol/d, which is equivalent to 7.2 ± 3.1 g salt. Bland-Altman plots showed good agreement at the group level between estimated and actual 24-h sodium excretion, with biases for the morning period of -3.5 mmol (95% CI: -14.8, 7.8 mmol; new equation) and 1.46 mmol (95% CI: -10.0, 13.0 mmol; Intersalt equation). A larger bias of 25.7 mmol (95% CI: 12.2, 39.3 mmol) was observed for the Tanaka equation in the morning period. The prediction accuracy did not differ significantly for spot urine samples collected at different times of the day or at a random time of day (P = 0.11-0.76). This study suggests that the application of both our own newly derived equation and the Intersalt equation to spot urine concentrations may be useful in predicting group means for 24-h sodium excretion in urban Asian populations. © 2016 American Society for Nutrition.
Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

Directory of Open Access Journals (Sweden)

Sutikno Sutikno

2010-08-01

Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.
Prediction of the neutrons subcritical multiplication using the diffusion hybrid equation with external neutron sources

Energy Technology Data Exchange (ETDEWEB)

Costa da Silva, Adilson; Carvalho da Silva, Fernando [COPPE/UFRJ, Programa de Engenharia Nuclear, Caixa Postal 68509, 21941-914, Rio de Janeiro (Brazil); Senra Martinez, Aquilino, E-mail: aquilino@lmp.ufrj.br [COPPE/UFRJ, Programa de Engenharia Nuclear, Caixa Postal 68509, 21941-914, Rio de Janeiro (Brazil)

2011-07-15

Highlights: > We proposed a new neutron diffusion hybrid equation with external neutron source. > A coarse mesh finite difference method for the adjoint flux and reactivity calculation was developed. > 1/M curve to predict the criticality condition is used. - Abstract: We used the neutron diffusion hybrid equation, in cartesian geometry with external neutron sources to predict the subcritical multiplication of neutrons in a pressurized water reactor, using a 1/M curve to predict the criticality condition. A Coarse Mesh Finite Difference Method was developed for the adjoint flux calculation and to obtain the reactivity values of the reactor. The results obtained were compared with benchmark values in order to validate the methodology presented in this paper.

Prediction of the neutrons subcritical multiplication using the diffusion hybrid equation with external neutron sources

International Nuclear Information System (INIS)

Costa da Silva, Adilson; Carvalho da Silva, Fernando; Senra Martinez, Aquilino

2011-01-01

Highlights: → We proposed a new neutron diffusion hybrid equation with external neutron source. → A coarse mesh finite difference method for the adjoint flux and reactivity calculation was developed. → 1/M curve to predict the criticality condition is used. - Abstract: We used the neutron diffusion hybrid equation, in cartesian geometry with external neutron sources to predict the subcritical multiplication of neutrons in a pressurized water reactor, using a 1/M curve to predict the criticality condition. A Coarse Mesh Finite Difference Method was developed for the adjoint flux calculation and to obtain the reactivity values of the reactor. The results obtained were compared with benchmark values in order to validate the methodology presented in this paper.
Prediction of Pure Component Adsorption Equilibria Using an Adsorption Isotherm Equation Based on Vacancy Solution Theory

DEFF Research Database (Denmark)

Marcussen, Lis; Aasberg-Petersen, K.; Krøll, Annette Elisabeth

2000-01-01

An adsorption isotherm equation for nonideal pure component adsorption based on vacancy solution theory and the Non-Random-Two-Liquid (NRTL) equation is found to be useful for predicting pure component adsorption equilibria at a variety of conditions. The isotherm equation is evaluated successfully...... adsorption systems, spreading pressure and isosteric heat of adsorption are also calculated....
Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

Science.gov (United States)

Southard, Rodney E.

2013-01-01

estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were
Fire spread in chaparral – a comparison of laboratory data and model predictions in burning live fuels

Science.gov (United States)

David R. Weise; Eunmo Koo; Xiangyang Zhou; Shankar Mahalingam; Frédéric Morandini; Jacques-Henri Balbi

2016-01-01

Fire behaviour data from 240 laboratory fires in high-density live chaparral fuel beds were compared with model predictions. Logistic regression was used to develop a model to predict fire spread success in the fuel beds and linear regression was used to predict rate of spread. Predictions from the Rothermel equation and three proposed changes as well as two physically...
Development of equations to predict the influence of floor space on average daily gain, average daily feed intake and gain : feed ratio of finishing pigs.

Science.gov (United States)

Flohr, J R; Dritz, S S; Tokach, M D; Woodworth, J C; DeRouchey, J M; Goodband, R D

2018-05-01

Floor space allowance for pigs has substantial effects on pig growth and welfare. Data from 30 papers examining the influence of floor space allowance on the growth of finishing pigs was used in a meta-analysis to develop alternative prediction equations for average daily gain (ADG), average daily feed intake (ADFI) and gain : feed ratio (G : F). Treatment means were compiled in a database that contained 30 papers for ADG and 28 papers for ADFI and G : F. The predictor variables evaluated were floor space (m2/pig), k (floor space/final BW0.67), Initial BW, Final BW, feed space (pigs per feeder hole), water space (pigs per waterer), group size (pigs per pen), gender, floor type and study length (d). Multivariable general linear mixed model regression equations were used. Floor space treatments within each experiment were the observational and experimental unit. The optimum equations to predict ADG, ADFI and G : F were: ADG, g=337.57+(16 468×k)-(237 350×k 2)-(3.1209×initial BW (kg))+(2.569×final BW (kg))+(71.6918×k×initial BW (kg)); ADFI, g=833.41+(24 785×k)-(388 998×k 2)-(3.0027×initial BW (kg))+(11.246×final BW (kg))+(187.61×k×initial BW (kg)); G : F=predicted ADG/predicted ADFI. Overall, the meta-analysis indicates that BW is an important predictor of ADG and ADFI even after computing the constant coefficient k, which utilizes final BW in its calculation. This suggests including initial and final BW improves the prediction over using k as a predictor alone. In addition, the analysis also indicated that G : F of finishing pigs is influenced by floor space allowance, whereas individual studies have concluded variable results.
Support vector regression model based predictive control of water level of U-tube steam generators

Energy Technology Data Exchange (ETDEWEB)

Kavaklioglu, Kadir, E-mail: kadir.kavaklioglu@pau.edu.tr

2014-10-15

Highlights: • Water level of U-tube steam generators was controlled in a model predictive fashion. • Models for steam generator water level were built using support vector regression. • Cost function minimization for future optimal controls was performed by using the steepest descent method. • The results indicated the feasibility of the proposed method. - Abstract: A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression

Science.gov (United States)

Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.

2010-01-01

In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Genomic prediction based on data from three layer lines using non-linear regression models

NARCIS (Netherlands)

Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

2014-01-01

Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate
Gaussian process regression for tool wear prediction

Science.gov (United States)

Kong, Dongdong; Chen, Yongjie; Li, Ning

2018-05-01

To realize and accelerate the pace of intelligent manufacturing, this paper presents a novel tool wear assessment technique based on the integrated radial basis function based kernel principal component analysis (KPCA_IRBF) and Gaussian process regression (GPR) for real-timely and accurately monitoring the in-process tool wear parameters (flank wear width). The KPCA_IRBF is a kind of new nonlinear dimension-increment technique and firstly proposed for feature fusion. The tool wear predictive value and the corresponding confidence interval are both provided by utilizing the GPR model. Besides, GPR performs better than artificial neural networks (ANN) and support vector machines (SVM) in prediction accuracy since the Gaussian noises can be modeled quantitatively in the GPR model. However, the existence of noises will affect the stability of the confidence interval seriously. In this work, the proposed KPCA_IRBF technique helps to remove the noises and weaken its negative effects so as to make the confidence interval compressed greatly and more smoothed, which is conducive for monitoring the tool wear accurately. Moreover, the selection of kernel parameter in KPCA_IRBF can be easily carried out in a much larger selectable region in comparison with the conventional KPCA_RBF technique, which helps to improve the efficiency of model construction. Ten sets of cutting tests are conducted to validate the effectiveness of the presented tool wear assessment technique. The experimental results show that the in-process flank wear width of tool inserts can be monitored accurately by utilizing the presented tool wear assessment technique which is robust under a variety of cutting conditions. This study lays the foundation for tool wear monitoring in real industrial settings.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

Energy Technology Data Exchange (ETDEWEB)

Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.

2017-11-01

Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Regression tree analysis for predicting body weight of Nigerian Muscovy duck (Cairina moschata

Directory of Open Access Journals (Sweden)

Oguntunji Abel Olusegun

2017-01-01

Full Text Available Morphometric parameters and their indices are central to the understanding of the type and function of livestock. The present study was conducted to predict body weight (BWT of adult Nigerian Muscovy ducks from nine (9 morphometric parameters and seven (7 body indices and also to identify the most important predictor of BWT among them using regression tree analysis (RTA. The experimental birds comprised of 1,020 adult male and female Nigerian Muscovy ducks randomly sampled in Rain Forest (203, Guinea Savanna (298 and Derived Savanna (519 agro-ecological zones. Result of RTA revealed that compactness; body girth and massiveness were the most important independent variables in predicting BWT and were used in constructing RT. The combined effect of the three predictors was very high and explained 91.00% of the observed variation of the target variable (BWT. The optimal regression tree suggested that Muscovy ducks with compactness >5.765 would be fleshy and have highest BWT. The result of the present study could be exploited by animal breeders and breeding companies in selection and improvement of BWT of Muscovy ducks.
Predictive market segmentation model: An application of logistic regression model and CHAID procedure

Directory of Open Access Journals (Sweden)

Soldić-Aleksić Jasna

2009-01-01

Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Predicting basal metabolic rates in Malaysian adult elite athletes.

Science.gov (United States)

Wong, Jyh Eiin; Poh, Bee Koon; Nik Shanita, Safii; Izham, Mohd Mohamad; Chan, Kai Quin; Tai, Meng De; Ng, Wei Wei; Ismail, Mohd Noor

2012-11-01

This study aimed to measure the basal metabolic rate (BMR) of elite athletes and develop a gender specific predictive equation to estimate their energy requirements. 92 men and 33 women (aged 18-31 years) from 15 sports, who had been training six hours daily for at least one year, were included in the study. Body composition was measured using the bioimpedance technique, and BMR by indirect calorimetry. The differences between measured and estimated BMR using various predictive equations were calculated. The novel equation derived from stepwise multiple regression was evaluated using Bland and Altman analysis. The predictive equations of Cunningham and the Food and Agriculture Organization/World Health Organization/United Nations University either over- or underestimated the measured BMR by up to ± 6%, while the equations of Ismail et al, developed from the local non-athletic population, underestimated the measured BMR by 14%. The novel predictive equation for the BMR of athletes was BMR (kcal/day) = 669 + 13 (weight in kg) + 192 (gender: 1 for men and 0 for women) (R2 0.548; standard error of estimates 163 kcal). Predicted BMRs of elite athletes by this equation were within 1.2% ± 9.5% of the measured BMR values. The novel predictive equation presented in this study can be used to calculate BMR for adult Malaysian elite athletes. Further studies may be required to validate its predictive capabilities for other sports, nationalities and age groups.
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY

OpenAIRE

Chayalakshmi C.L

2018-01-01

MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...
A prediction equation for enteric methane emission from dairy cows for use in NorFor

DEFF Research Database (Denmark)

Nielsen, N I; Volden, H; Åkerlind, M

2013-01-01

A data-set with 47 treatment means (N = 211) was compiled from research institutions in Denmark, Norway, and Sweden in order to develop a prediction equation for enteric methane (CH4) emissions from dairy cows. The aim was to implement the equation in the Nordic feed evaluation system NorFor. The...
Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.

Science.gov (United States)

Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack

2017-06-01

In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.
Fine-Tuning Nonhomogeneous Regression for Probabilistic Precipitation Forecasts: Unanimous Predictions, Heavy Tails, and Link Functions

DEFF Research Database (Denmark)

Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.

2017-01-01

functions for the optimization of regression coefficients for the scale parameter. These three refinements are tested for 10 stations in a small area of the European Alps for lead times from +24 to +144 h and accumulation periods of 24 and 6 h. Together, they improve probabilistic forecasts...... to obtain automatically corrected weather forecasts. This study applies the nonhomogenous regression framework as a state-of-the-art ensemble postprocessing technique to predict a full forecast distribution and improves its forecast performance with three statistical refinements. First of all, a novel split...... for precipitation amounts as well as the probability of precipitation events over the default postprocessing method. The improvements are largest for the shorter accumulation periods and shorter lead times, where the information of unanimous ensemble predictions is more important....
Estimating the Accuracy of the Chedoke–McMaster Stroke Assessment Predictive Equations for Stroke Rehabilitation

Science.gov (United States)

Dang, Mia; Ramsaran, Kalinda D.; Street, Melissa E.; Syed, S. Noreen; Barclay-Goddard, Ruth; Miller, Patricia A.

2011-01-01

ABSTRACT Purpose: To estimate the predictive accuracy and clinical usefulness of the Chedoke–McMaster Stroke Assessment (CMSA) predictive equations. Method: A longitudinal prognostic study using historical data obtained from 104 patients admitted post cerebrovascular accident was undertaken. Data were abstracted for all patients undergoing rehabilitation post stroke who also had documented admission and discharge CMSA scores. Published predictive equations were used to determine predicted outcomes. To determine the accuracy and clinical usefulness of the predictive model, shrinkage coefficients and predictions with 95% confidence bands were calculated. Results: Complete data were available for 74 patients with a mean age of 65.3±12.4 years. The shrinkage values for the six Impairment Inventory (II) dimensions varied from −0.05 to 0.09; the shrinkage value for the Activity Inventory (AI) was 0.21. The error associated with predictive values was greater than ±1.5 stages for the II dimensions and greater than ±24 points for the AI. Conclusions: This study shows that the large error associated with the predictions (as defined by the confidence band) for the CMSA II and AI limits their clinical usefulness as a predictive measure. Further research to establish predictive models using alternative statistical procedures is warranted. PMID:22654239
Structural equation modelling based data fusion for technology forecasting: A generic framework

CSIR Research Space (South Africa)

Staphorst, L

2013-07-01

Full Text Available to explain the variations in independent variables as functions (commonly referred to regression functions) of variations in dependent variables [13]. With this knowledge it is then possible to perform prediction and forecasting of the values that dependent....G.; “A General Method for Estimating a Linear Structural Equation System,” in Structural Equation Models in the Social Sciences, eds.: A.S. Goldberger and O. D. Duncan, New York: Seminar, 1973. [15] Steinberg, A.N. and Rogova, G.; "Situation...
Multivariate Prediction Equations for HbA1c Lowering, Weight Change, and Hypoglycemic Events Associated with Insulin Rescue Medication in Type 2 Diabetes Mellitus: Informing Economic Modeling.

Science.gov (United States)

Willis, Michael; Asseburg, Christian; Nilsson, Andreas; Johnsson, Kristina; Kartman, Bernt

2017-03-01

Type 2 diabetes mellitus (T2DM) is chronic and progressive and the cost-effectiveness of new treatment interventions must be established over long time horizons. Given the limited durability of drugs, assumptions regarding downstream rescue medication can drive results. Especially for insulin, for which treatment effects and adverse events are known to depend on patient characteristics, this can be problematic for health economic evaluation involving modeling. To estimate parsimonious multivariate equations of treatment effects and hypoglycemic event risks for use in parameterizing insulin rescue therapy in model-based cost-effectiveness analysis. Clinical evidence for insulin use in T2DM was identified in PubMed and from published reviews and meta-analyses. Study and patient characteristics and treatment effects and adverse event rates were extracted and the data used to estimate parsimonious treatment effect and hypoglycemic event risk equations using multivariate regression analysis. Data from 91 studies featuring 171 usable study arms were identified, mostly for premix and basal insulin types. Multivariate prediction equations for glycated hemoglobin A 1c lowering and weight change were estimated separately for insulin-naive and insulin-experienced patients. Goodness of fit (R 2 ) for both outcomes were generally good, ranging from 0.44 to 0.84. Multivariate prediction equations for symptomatic, nocturnal, and severe hypoglycemic events were also estimated, though considerable heterogeneity in definitions limits their usefulness. Parsimonious and robust multivariate prediction equations were estimated for glycated hemoglobin A 1c and weight change, separately for insulin-naive and insulin-experienced patients. Using these in economic simulation modeling in T2DM can improve realism and flexibility in modeling insulin rescue medication. Copyright © 2017 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All

Comparison of Two Creatinine-Based Equations for Predicting Decline in Renal Function in Type 2 Diabetic Patients with Nephropathy in a Korean Population

Directory of Open Access Journals (Sweden)

Eun Young Lee

2013-01-01

Full Text Available Aim. To compare two creatinine-based estimated glomerular filtration rate (eGFR equations, the chronic kidney disease epidemiology collaboration (CKD-EPI and the modification of diet in renal disease (MDRD, for predicting the risk of CKD progression in type 2 diabetic patients with nephropathy. Methods. A total of 707 type 2 diabetic patients with 24 hr urinary albumin excretion of more than 30 mg/day were retrospectively recruited and traced until doubling of baseline serum creatinine (SCr levels was noted. Results. During the follow-up period (median, 2.4 years, the CKD-EPI equation reclassified 10.9% of all MDRD-estimated subjects: 9.1% to an earlier stage of CKD and 1.8% to a later stage of CKD. Overall, the prevalence of CKD (eGFR < 60 mL/min/1.73 m2 was lowered from 54% to 51.6% by applying the CKD-EPI equation. On Cox-regression analysis, both equations exhibited significant associations with an increased risk for doubling of SCr. However, only the CKD-EPI equation maintained a significant hazard ratio for doubling of SCr in earlier-stage CKD (eGFR ≥ 45 mL/min/1.73 m2, when compared to stage 1 CKD (eGFR ≥ 90 mL/min/1.73 m2. Conclusion. In regard to CKD progression, these results suggest that the CKD-EPI equation might more accurately stratify earlier-stage CKD among type 2 diabetic patients with nephropathy than the MDRD study equation.
Validity of impedance-based equations for the prediction of total body water as measured by deuterium dilution in African women

International Nuclear Information System (INIS)

Dioum, Aissatou S.; Cisse, Aita; Wade, Salimata; Gartner, Agnes; Delpeuch, Francis; Maire, Bernard; Schutz, Yves

2005-01-01

Background: Little information is available on the validity of simple and indirect body-composition methods in non-Western populations. Equations for predicting body composition are population- specific, and body composition differs between blacks and whites. Objective:Wetestedthehypothesisthatthevalidityofequationsfor predicting total body water (TBW) from bioelectrical impedance analysis measurements is likely to depend on the racial background of the group from which the equations were derived. Design: The hypothesis was tested by comparing, in 36 African women, TBW values measured by deuterium dilution with those predicted by 23 equations developed in white, African American, or African subjects. These cross-validations in our African sample were also compared, whenever possible, with results from other studies in black subjects. Results: Errors in predicting TBW showed acceptable values (1.3- 1.9 kg) in all cases, whereas a large range of bias (0.2-6.1 kg) was observed independently of the ethnic origin of the sample from which the equations were derived. Three equations (2 from whites and 1 from blacks) showed nonsignificant bias and could be used in Africans. In all other cases, we observed either an overestimation or under estimation of TBW with variable bias values, regardless of racial background, yielding no clear trend for validity as a function of ethnic origin. Conclusions: The findings of this cross-validation study emphasize the need for further fundamental research to explore the causes of the poor validity of TBW prediction equations across populations rather than the need to develop new prediction equations for use in Africa. (Authors)
Hand-held indirect calorimeter offers advantages compared with prediction equations, in a group of overweight women, to determine resting energy expenditures and estimated total energy expenditures during research screening.

Science.gov (United States)

Spears, Karen E; Kim, Hyunsook; Behall, Kay M; Conway, Joan M

2009-05-01

To compare standardized prediction equations to a hand-held indirect calorimeter in estimating resting energy and total energy requirements in overweight women. Resting energy expenditure (REE) was measured by hand-held indirect calorimeter and calculated by prediction equations Harris-Benedict, Mifflin-St Jeor, World Health Organization/Food and Agriculture Organization/United Nations University (WHO), and Dietary Reference Intakes (DRI). Physical activity level, assessed by questionnaire, was used to estimate total energy expenditure (TEE). Subjects (n=39) were female nonsmokers older than 25 years of age with body mass index more than 25. Repeated measures analysis of variance, Bland-Altman plot, and fitted regression line of difference. A difference within +/-10% of two methods indicated agreement. Significant proportional bias was present between hand-held indirect calorimeter and prediction equations for REE and TEE (Pvalues and underestimated at higher values. Mean differences (+/-standard error) for REE and TEE between hand-held indirect calorimeter and Harris-Benedict were -5.98+/-46.7 kcal/day (P=0.90) and 21.40+/-75.7 kcal/day (P=0.78); between hand-held indirect calorimeter and Mifflin-St Jeor were 69.93+/-46.7 kcal/day (P=0.14) and 116.44+/-75.9 kcal/day (P=0.13); between hand-held indirect calorimeter and WHO were -22.03+/-48.4 kcal/day (P=0.65) and -15.8+/-77.9 kcal/day (P=0.84); and between hand-held indirect calorimeter and DRI were 39.65+/-47.4 kcal/day (P=0.41) and 56.36+/-85.5 kcal/day (P=0.51). Less than 50% of predictive equation values were within +/-10% of hand-held indirect calorimeter values, indicating poor agreement. A significant discrepancy between predicted and measured energy expenditure was observed. Further evaluation of hand-held indirect calorimeter research screening is needed.
Regression equations for calculation of z scores for echocardiographic measurements of right heart structures in healthy Han Chinese children.

Science.gov (United States)

Wang, Shan-Shan; Zhang, Yu-Qi; Chen, Shu-Bao; Huang, Guo-Ying; Zhang, Hong-Yan; Zhang, Zhi-Fang; Wu, Lan-Ping; Hong, Wen-Jing; Shen, Rong; Liu, Yi-Qing; Zhu, Jun-Xue

2017-06-01

Clinical decision making in children with congenital and acquired heart disease relies on measurements of cardiac structures using two-dimensional echocardiography. We aimed to establish z-score regression equations for right heart structures in healthy Chinese Han children. Two-dimensional and M-mode echocardiography was performed in 515 patients. We measured the dimensions of the pulmonary valve annulus (PVA), main pulmonary artery (MPA), left pulmonary artery (LPA), right pulmonary artery (RPA), right ventricular outflow tract at end-diastole (RVOTd) and at end-systole (RVOTs), tricuspid valve annulus (TVA), right ventricular inflow tract at end-diastole (RVIDd) and at end-systole (RVIDs), and right atrium (RA). Regression analyses were conducted to relate the measurements of right heart structures to 4body surface area (BSA). Right ventricular outflow-tract fractional shortening (RVOTFS) was also calculated. Several models were used, and the best model was chosen to establish a z-score calculator. PVA, MPA, LPA, RPA, RVOTd, RVOTs, TVA, RVIDd, RVIDs, and RA (R 2 = 0.786, 0.705, 0.728, 0.701, 0.706, 0.824, 0.804, 0.663, 0.626, and 0.793, respectively) had a cubic polynomial relationship with BSA; specifically, measurement (M) = β0 + β1 × BSA + β2 × BSA 2 + β3 × BSA. 3 RVOTFS (0.28 ± 0.02) fell within a narrow range (0.12-0.51). Our results provide reference values for z scores and regression equations for right heart structures in Han Chinese children. These data may help interpreting the routine clinical measurement of right heart structures in children with congenital or acquired heart disease. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 45:293-303, 2017. © 2017 Wiley Periodicals, Inc.
Predictive based monitoring of nuclear plant component degradation using support vector regression

International Nuclear Information System (INIS)

Agarwal, Vivek; Alamaniotis, Miltiadis; Tsoukalas, Lefteri H.

2015-01-01

Nuclear power plants (NPPs) are large installations comprised of many active and passive assets. Degradation monitoring of all these assets is expensive (labor cost) and highly demanding task. In this paper a framework based on Support Vector Regression (SVR) for online surveillance of critical parameter degradation of NPP components is proposed. In this case, on time replacement or maintenance of components will prevent potential plant malfunctions, and reduce the overall operational cost. In the current work, we apply SVR equipped with a Gaussian kernel function to monitor components. Monitoring includes the one-step-ahead prediction of the component's respective operational quantity using the SVR model, while the SVR model is trained using a set of previous recorded degradation histories of similar components. Predictive capability of the model is evaluated upon arrival of a sensor measurement, which is compared to the component failure threshold. A maintenance decision is based on a fuzzy inference system that utilizes three parameters: (i) prediction evaluation in the previous steps, (ii) predicted value of the current step, (iii) and difference of current predicted value with components failure thresholds. The proposed framework will be tested on turbine blade degradation data.
Comparison of some biased estimation methods (including ordinary subset regression) in the linear model

Science.gov (United States)

Sidik, S. M.

1975-01-01

Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Comparison of Regression Techniques to Predict Response of Oilseed Rape Yield to Variation in Climatic Conditions in Denmark

DEFF Research Database (Denmark)

Sharif, Behzad; Makowski, David; Plauborg, Finn

2017-01-01

Statistical regression models represent alternatives to process-based dynamic models for predicting the response of crop yields to variation in climatic conditions. Regression models can be used to quantify the effect of change in temperature and precipitation on yields. However, it is difficult ...
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis Revision 2

Energy Technology Data Exchange (ETDEWEB)

Narlesky, Joshua Edward [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Kelly, Elizabeth J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2015-09-10

This report documents the new PG calibration regression equation. These calibration equations incorporate new data that have become available since revision 1 of “A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis” was issued [3] The calibration equations are based on a weighted least squares (WLS) approach for the regression. The WLS method gives each data point its proper amount of influence over the parameter estimates. This gives two big advantages, more precise parameter estimates and better and more defensible estimates of uncertainties. The WLS approach makes sense both statistically and experimentally because the variances increase with concentration, and there are physical reasons that the higher measurements are less reliable and should be less influential. The new magnesium calibration includes a correction for sodium and separate calibration equation for items with and without chlorine. These additional calibration equations allow for better predictions and smaller uncertainties for sodium in materials with and without chlorine. Chlorine and sodium have separate equations for RICH materials. Again, these equations give better predictions and smaller uncertainties chlorine and sodium for RICH materials.
Predicting fractional bed load transport rates: Application of the Wilcock‐Crowe equations to a regulated gravel bed river

Science.gov (United States)

Gaeuman, David; Andrews, E.D.; Krause, Andreas; Smith, Wes

2009-01-01

Bed load samples from four locations in the Trinity River of northern California are analyzed to evaluate the performance of the Wilcock‐Crowe bed load transport equations for predicting fractional bed load transport rates. Bed surface particles become smaller and the fraction of sand on the bed increases with distance downstream from Lewiston Dam. The dimensionless reference shear stress for the mean bed particle size (τ*rm) is largest near the dam, but varies relatively little between the more downstream locations. The relation between τ*rm and the reference shear stresses for other size fractions is constant across all locations. Total bed load transport rates predicted with the Wilcock‐Crowe equations are within a factor of 2 of sampled transport rates for 68% of all samples. The Wilcock‐Crowe equations nonetheless consistently under‐predict the transport of particles larger than 128 mm, frequently by more than an order of magnitude. Accurate prediction of the transport rates of the largest particles is important for models in which the evolution of the surface grain size distribution determines subsequent bed load transport rates. Values of τ*rm estimated from bed load samples are up to 50% larger than those predicted with the Wilcock‐Crowe equations, and sampled bed load transport approximates equal mobility across a wider range of grain sizes than is implied by the equations. Modifications to the Wilcock‐Crowe equation for determining τ*rm and the hiding function used to scale τ*rm to other grain size fractions are proposed to achieve the best fit to observed bed load transport in the Trinity River.
Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

Science.gov (United States)

Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

2015-04-01

Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Regression Phalanxes

OpenAIRE

Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

2017-01-01

Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Predicting Eight Grade Students' Equation Solving Performances via Concepts of Variable and Equality

Science.gov (United States)

Ertekin, Erhan

2017-01-01

This study focused on how two algebraic concepts- equality and variable- predicted 8th grade students' equation solving performance. In this study, predictive design as a correlational research design was used. Randomly selected 407 eight-grade students who were from the central districts of a city in the central region of Turkey participated in…
Bulk Density Prediction for Histosols and Soil Horizons with High Organic Matter Content

Directory of Open Access Journals (Sweden)

Sidinei Julio Beutler

Full Text Available ABSTRACT Bulk density (Bd can easily be predicted from other data using pedotransfer functions (PTF. The present study developed two PTFs (PTF1 and PTF2 for Bd prediction in Brazilian organic soils and horizons and compared their performance with nine previously published equations. Samples of 280 organic soil horizons used to develop PTFs and containing at least 80 g kg-1 total carbon content (TOC were obtained from different regions of Brazil. The multiple linear stepwise regression technique was applied to validate all the equations using an independent data set. Data were transformed using Box-Cox to meet the assumptions of the regression models. For validation of PTF1 and PTF2, the coefficient of determination (R2 was 0.47 and 0.37, mean error -0.04 and 0.10, and root mean square error 0.22 and 0.26, respectively. The best performance was obtained for the PTF1, PTF2, Hollis, and Honeysett equations. The PTF1 equation is recommended when clay content data are available, but considering that they are scarce for organic soils, the PTF2, Hollis, and Honeysett equations are the most suitable because they use TOC as a predictor variable. Considering the particular characteristics of organic soils and the environmental context in which they are formed, the equations developed showed good accuracy in predicting Bd compared with already existing equations.
Principal component regression analysis with SPSS.

Science.gov (United States)

Liu, R X; Kuang, J; Gong, Q; Hou, X L

2003-06-01

The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis

Science.gov (United States)

Johnson, William L.; Johnson, Annabel M.; Johnson, Jared

2012-01-01

Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

Directory of Open Access Journals (Sweden)

Peek Andrew S

2007-06-01

Full Text Available Abstract Background RNA interference (RNAi is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM approach was used to quantitatively model RNA interference activities. Results Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. Conclusion The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid
Prediction of beef marblingusing Hyperspectral Imaging (HSI and Partial Least Squares Regression (PLSR

Directory of Open Access Journals (Sweden)

Victor Aredo

2017-01-01

Full Text Available The aim of this study was to build a model to predict the beef marbling using HSI and Partial Least Squares Regression (PLSR. Totally 58 samples of longissmus dorsi muscle were scanned by a HSI system (400 - 1000 nm in reflectance mode, using 44 samples to build t he PLSR model and 14 samples to model validation. The Japanese Beef Marbling Standard (BMS was used as reference by 15 middle - trained judges for the samples evaluation. The scores were assigned as continuous values and varied from 1.2 to 5.3 BMS. The PLSR model showed a high correlation coefficient in the prediction (r = 0.95, a low Standard Error of Calibration (SEC of 0.2 BMS score, and a low Standard Error of Prediction (SEP of 0.3 BMS score.
Vapor-liquid equilibrium prediction with pseudo-cubic equation of state for binary mixtures containing hydrogen, helium, or neon

Energy Technology Data Exchange (ETDEWEB)

Kato, M.; Tanaka, H. (Nihon Univ.,Fukushima, (Japan). Faculty of Enineering)

1990-03-01

As an equation of state of vapor-liquid equilibrium, an original pseudo-cubic equation of state was previously proposed by the authors of this report and its study is continued. In the present study, new effective critical values of hydrogen, helium and neon were determined empirically from vapor-liquid equilibrium data of literature values against their critical temperatures, critical pressures and critical volumes. The vapor-liquid equilibrium relations of binary system quantum gas mixtures were predicted combining the conventinal pseudo-cubic equation of state and the new effective critical values, and without using binary heteromolecular interaction parameter. The predicted values of hydrogen-ethylene, helium-propane and neon-oxygen systems were compared with literature values. As a result, it was indicated that the vapor-liquid relations of binary system mixtures containing hydrogen, helium and neon can be predicted with favorable accuracy combining the effective critical values and the three parameter pseudo-cubic equation of state. 37 refs., 3 figs., 4 tabs.
A Course Specific Perspective in the Prediction of Academic Success.

Science.gov (United States)

Beaulieu, R. P.

1990-01-01

Students (N=94) enrolled in a senior-level management course over six semesters were used to investigate the ability of four measures from two industrial tests to predict course performance. The resulting multiple regression equation with four predictors could accurately predict achievement of males, but not of females. (Author/TE)

[Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

Science.gov (United States)

Ling, Ru; Liu, Jiawang

2011-12-01

To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Short-term wind speed prediction using an unscented Kalman filter based state-space support vector regression approach

International Nuclear Information System (INIS)

Chen, Kuilin; Yu, Jie

2014-01-01

Highlights: • A novel hybrid modeling method is proposed for short-term wind speed forecasting. • Support vector regression model is constructed to formulate nonlinear state-space framework. • Unscented Kalman filter is adopted to recursively update states under random uncertainty. • The new SVR–UKF approach is compared to several conventional methods for short-term wind speed prediction. • The proposed method demonstrates higher prediction accuracy and reliability. - Abstract: Accurate wind speed forecasting is becoming increasingly important to improve and optimize renewable wind power generation. Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation. However, this task remains challenging due to the strong stochastic nature and dynamic uncertainty of wind speed. In this study, unscented Kalman filter (UKF) is integrated with support vector regression (SVR) based state-space model in order to precisely update the short-term estimation of wind speed sequence. In the proposed SVR–UKF approach, support vector regression is first employed to formulate a nonlinear state-space model and then unscented Kalman filter is adopted to perform dynamic state estimation recursively on wind sequence with stochastic uncertainty. The novel SVR–UKF method is compared with artificial neural networks (ANNs), SVR, autoregressive (AR) and autoregressive integrated with Kalman filter (AR-Kalman) approaches for predicting short-term wind speed sequences collected from three sites in Massachusetts, USA. The forecasting results indicate that the proposed method has much better performance in both one-step-ahead and multi-step-ahead wind speed predictions than the other approaches across all the locations
Accounting for measurement error in log regression models with applications to accelerated testing.

Science.gov (United States)

Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

2018-01-01

In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Accounting for measurement error in log regression models with applications to accelerated testing.

Directory of Open Access Journals (Sweden)

Robert Richardson

Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Prediction of survival to discharge following cardiopulmonary resuscitation using classification and regression trees.

Science.gov (United States)

Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G

2013-12-01

To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.
A prediction model of compressor with variable-geometry diffuser based on elliptic equation and partial least squares.

Science.gov (United States)

Li, Xu; Yang, Chuanlei; Wang, Yinyan; Wang, Hechun

2018-01-01

To achieve a much more extensive intake air flow range of the diesel engine, a variable-geometry compressor (VGC) is introduced into a turbocharged diesel engine. However, due to the variable diffuser vane angle (DVA), the prediction for the performance of the VGC becomes more difficult than for a normal compressor. In the present study, a prediction model comprising an elliptical equation and a PLS (partial least-squares) model was proposed to predict the performance of the VGC. The speed lines of the pressure ratio map and the efficiency map were fitted with the elliptical equation, and the coefficients of the elliptical equation were introduced into the PLS model to build the polynomial relationship between the coefficients and the relative speed, the DVA. Further, the maximal order of the polynomial was investigated in detail to reduce the number of sub-coefficients and achieve acceptable fit accuracy simultaneously. The prediction model was validated with sample data and in order to present the superiority of compressor performance prediction, the prediction results of this model were compared with those of the look-up table and back-propagation neural networks (BPNNs). The validation and comparison results show that the prediction accuracy of the new developed model is acceptable, and this model is much more suitable than the look-up table and the BPNN methods under the same condition in VGC performance prediction. Moreover, the new developed prediction model provides a novel and effective prediction solution for the VGC and can be used to improve the accuracy of the thermodynamic model for turbocharged diesel engines in the future.
The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression

Directory of Open Access Journals (Sweden)

Chunxiao Zhang

2012-01-01

Full Text Available The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.
A modified parallel constitutive model for elevated temperature flow behavior of Ti-6Al-4V alloy based on multiple regression

Energy Technology Data Exchange (ETDEWEB)

Cai, Jun; Shi, Jiamin; Wang, Kuaishe; Wang, Wen; Wang, Qingjuan; Liu, Yingying [Xi' an Univ. of Architecture and Technology, Xi' an (China). School of Metallurgical Engineering; Li, Fuguo [Northwestern Polytechnical Univ., Xi' an (China). School of Materials Science and Engineering

2017-07-15

Constitutive analysis for hot working of Ti-6Al-4V alloy was carried out by using experimental stress-strain data from isothermal hot compression tests. A new kind of constitutive equation called a modified parallel constitutive model was proposed by considering the independent effects of strain, strain rate and temperature. The predicted flow stress data were compared with the experimental data. Statistical analysis was introduced to verify the validity of the developed constitutive equation. Subsequently, the accuracy of the proposed constitutive equations was evaluated by comparing with other constitutive models. The results showed that the developed modified parallel constitutive model based on multiple regression could predict flow stress of Ti-6Al-4V alloy with good correlation and generalization.
Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

Science.gov (United States)

Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

2003-01-01

Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
Predicting longitudinal trajectories of health probabilities with random-effects multinomial logit regression.

Science.gov (United States)

Liu, Xian; Engel, Charles C

2012-12-20

Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.
GIS-based spatial regression and prediction of water quality in river networks: A case study in Iowa

Science.gov (United States)

Yang, X.; Jin, W.

2010-01-01

Nonpoint source pollution is the leading cause of the U.S.'s water quality problems. One important component of nonpoint source pollution control is an understanding of what and how watershed-scale conditions influence ambient water quality. This paper investigated the use of spatial regression to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration in the Cedar River Watershed, Iowa. An Arc Hydro geodatabase was constructed to organize various datasets on the watershed. Spatial regression models were developed to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration and predict NO3NO2-N concentration at unmonitored locations. Unlike the traditional ordinary least square (OLS) method, the spatial regression method incorporates the potential spatial correlation among the observations in its coefficient estimation. Study results show that NO3NO2-N observations in the Cedar River Watershed are spatially correlated, and by ignoring the spatial correlation, the OLS method tends to over-estimate the impacts of watershed characteristics on stream NO3NO2-N concentration. In conjunction with kriging, the spatial regression method not only makes better stream NO3NO2-N concentration predictions than the OLS method, but also gives estimates of the uncertainty of the predictions, which provides useful information for optimizing the design of stream monitoring network. It is a promising tool for better managing and controlling nonpoint source pollution. ?? 2010 Elsevier Ltd.
Equations describing contamination of run of mine coal with dirt in the Upper Silesian Coalfield

Energy Technology Data Exchange (ETDEWEB)

Winiewski, J J

1977-12-01

Statistical analysis proved that contamination with dirt of run of mine coal from seams in the series 200 to 600 of the Upper Silesian Coalfield depends on the average ash content of a given raw coal. A regression equation is deduced for coarse and fine sizes of each coal. These equations can be used to predict the degree of contamination of run of mine coal to an accuracy sufficient for coal preparation purposes.
riskRegression

DEFF Research Database (Denmark)

Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

2017-01-01

In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....
Body composition estimation from selected slices: equations computed from a new semi-automatic thresholding method developed on whole-body CT scans

Directory of Open Access Journals (Sweden)

Alizé Lacoste Jeanson

2017-05-01

Full Text Available Background Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. Methods We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT and lean tissue (LT in such material. An intra-class correlation coefficient (ICC was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS linear regressions and support vector machine regression (SVMR. Results and Discussion The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5 and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77 than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08 for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results.
Body composition estimation from selected slices: equations computed from a new semi-automatic thresholding method developed on whole-body CT scans.

Science.gov (United States)

Lacoste Jeanson, Alizé; Dupej, Ján; Villa, Chiara; Brůžek, Jaroslav

2017-01-01

Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT) and lean tissue (LT) in such material. An intra-class correlation coefficient (ICC) was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS) linear regressions and support vector machine regression (SVMR). The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5) and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77) than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08) for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results.
Quantile Regression With Measurement Error

KAUST Repository

Wei, Ying

2009-08-27

Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

OpenAIRE

KELEŞ, Taliha; ALTUN, Murat

2016-01-01

Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...
A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

Directory of Open Access Journals (Sweden)

Paulino José García Nieto

2016-05-01

Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.
B(E2) ↑ (01+ -> 21+) predictions for even–even nuclei in the differential equation model

International Nuclear Information System (INIS)

Nayak, R.C.; Pattnaik, S.

2015-01-01

We use the recently developed differential equation model (DEM) for the reduced electric quadrupole transition probability B(E2)↑ for the transition from the ground to the first 2 + state for predicting its values for a wide range of even–even nuclides almost throughout the nuclear landscape from Neon to Californium. This is made possible as the principal equation in the model, namely, the differential equation connecting the B(E2)↑ value of a given even–even nucleus with its derivatives with respect to the neutron and proton numbers, provides two different recursion relations, each connecting three different neighboring even–even nuclei from lower- to higher-mass numbers and vice versa. These relations are primarily responsible in extrapolating from known to unknown terrain of the B(E2)↑-landscape and thereby facilitate the predictions throughout. As a result, we have succeeded in predicting its hitherto unknown value for the adjacent 251 isotopes lying on either side of the known B(E2)↑ database. (author)
Fouling resistance prediction using artificial neural network nonlinear auto-regressive with exogenous input model based on operating conditions and fluid properties correlations

Energy Technology Data Exchange (ETDEWEB)

Biyanto, Totok R. [Department of Engineering Physics, Institute Technology of Sepuluh Nopember Surabaya, Surabaya, Indonesia 60111 (Indonesia)

2016-06-03

Fouling in a heat exchanger in Crude Preheat Train (CPT) refinery is an unsolved problem that reduces the plant efficiency, increases fuel consumption and CO{sub 2} emission. The fouling resistance behavior is very complex. It is difficult to develop a model using first principle equation to predict the fouling resistance due to different operating conditions and different crude blends. In this paper, Artificial Neural Networks (ANN) MultiLayer Perceptron (MLP) with input structure using Nonlinear Auto-Regressive with eXogenous (NARX) is utilized to build the fouling resistance model in shell and tube heat exchanger (STHX). The input data of the model are flow rates and temperatures of the streams of the heat exchanger, physical properties of product and crude blend data. This model serves as a predicting tool to optimize operating conditions and preventive maintenance of STHX. The results show that the model can capture the complexity of fouling characteristics in heat exchanger due to thermodynamic conditions and variations in crude oil properties (blends). It was found that the Root Mean Square Error (RMSE) are suitable to capture the nonlinearity and complexity of the STHX fouling resistance during phases of training and validation.

Predicting Charging Time of Battery Electric Vehicles Based on Regression and Time-Series Methods: A Case Study of Beijing

Directory of Open Access Journals (Sweden)

Jun Bi

2018-04-01

Full Text Available Battery electric vehicles (BEVs reduce energy consumption and air pollution as compared with conventional vehicles. However, the limited driving range and potential long charging time of BEVs create new problems. Accurate charging time prediction of BEVs helps drivers determine travel plans and alleviate their range anxiety during trips. This study proposed a combined model for charging time prediction based on regression and time-series methods according to the actual data from BEVs operating in Beijing, China. After data analysis, a regression model was established by considering the charged amount for charging time prediction. Furthermore, a time-series method was adopted to calibrate the regression model, which significantly improved the fitting accuracy of the model. The parameters of the model were determined by using the actual data. Verification results confirmed the accuracy of the model and showed that the model errors were small. The proposed model can accurately depict the charging time characteristics of BEVs in Beijing.
Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

Science.gov (United States)

Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

2012-01-01

The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

International Nuclear Information System (INIS)

Sun Zhong-Hua; Jiang Fan

2010-01-01

In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method. (rapid communication)
Simple equations to predict concentric lower-body muscle power in older adults using the 30-second chair-rise test: a pilot study

Directory of Open Access Journals (Sweden)

Wesley N Smith

2010-07-01

Full Text Available Wesley N Smith1, Gianluca Del Rossi1, Jessica B Adams1, KZ Abderlarahman2, Shihab A Asfour2, Bernard A Roos1,3,4,5, Joseph F Signorile1,31Department of Exercise and Sport Sciences,2Department of Industrial Engineering, University of Miami, Coral Gables, FL, USA; 3Geriatric Research, Education, and Clinical Center, Bruce W Carter Department of Veterans Affairs Medical Center, Miami, FL, USA; 4Departments of Medicine and Neurology, University of Miami Miller School of Medicine, Miami, FL, USA; 5Stein Gerontological Institute, Miami Jewish Health Systems, Miami, FL, USAAbstract: Although muscle power is an important factor affecting independence in older adults, there is no inexpensive or convenient test to quantify power in this population. Therefore, this pilot study examined whether regression equations for evaluating muscle power in older adults could be derived from a simple chair-rise test. We collected data from a 30-second chair-rise test performed by fourteen older adults (76 ± 7.19 years. Average (AP and peak (PP power values were computed using data from force-platform and high-speed motion analyses. Using each participant’s body mass and the number of chair rises performed during the first 20 seconds of the 30-second trial, we developed multivariate linear regression equations to predict AP and PP. The values computed using these equations showed a significant linear correlation with the values derived from our force-platform and high-speed motion analyses (AP: R = 0.89; PP: R = 0.90; P < 0.01. Our results indicate that lower-body muscle power in fit older adults can be accurately evaluated using the data from the initial 20 seconds of a simple 30-second chair-rise test, which requires no special equipment, preparation, or setting.Keywords: instrumental activity of daily living, clinical test, elderly, chair-stand test, leg power
An angstrom equation analysis of solar insolation data in Malaysia

International Nuclear Information System (INIS)

Lee Fai Tsen

2000-01-01

Solar energy systems rely extensively on the availability of global solar radiation for optimum performances. Standard method of measurements involves the use of sunshine recorders to record the sunshine hours, solarimeters and chart recorders to record the diffuse and direct solar radiation. The method tends to be expensive and time consuming. As a result, fewer stations may be set up to monitor the solar insulation data Linear regression method using Angstrom equation of the type G = G 0 (a +bn/N) has been used extensively to analyze global radiation at the site of the station. The equation gives the linear regression coefficients a and h which are characteristics of the station. The equation may therefore be used to predict global radiation at and around the station, if the area surrounding the station is geographically similar, or if it is not characteristically changed due to developments over the years. We present here an analysis of the solar insulation data of several meteorological stations in West Malaysia to obtain the linear regression coefficient a and b base on yearly analysis. It is interesting to find that the values of a and b have changed over the years. This may have been due to the global warming effect, or extensive land clearing for local developments which have resulted in haze and pollution that could affect the solar insulation data received at the station. (Author)
Fungible weights in logistic regression.

Science.gov (United States)

Jones, Jeff A; Waller, Niels G

2016-06-01

In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Predicting Taxi-Out Time at Congested Airports with Optimization-Based Support Vector Regression Methods

Directory of Open Access Journals (Sweden)

Guan Lian

2018-01-01

Full Text Available Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

KAUST Repository

Abdul Jameel, Abdul Gani

2016-09-14

An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.
Predictability of extreme weather events for NE U.S.: improvement of the numerical prediction using a Bayesian regression approach

Science.gov (United States)

Yang, J.; Astitha, M.; Anagnostou, E. N.; Hartman, B.; Kallos, G. B.

2015-12-01

Weather prediction accuracy has become very important for the Northeast U.S. given the devastating effects of extreme weather events in the recent years. Weather forecasting systems are used towards building strategies to prevent catastrophic losses for human lives and the environment. Concurrently, weather forecast tools and techniques have evolved with improved forecast skill as numerical prediction techniques are strengthened by increased super-computing resources. In this study, we examine the combination of two state-of-the-science atmospheric models (WRF and RAMS/ICLAMS) by utilizing a Bayesian regression approach to improve the prediction of extreme weather events for NE U.S. The basic concept behind the Bayesian regression approach is to take advantage of the strengths of two atmospheric modeling systems and, similar to the multi-model ensemble approach, limit their weaknesses which are related to systematic and random errors in the numerical prediction of physical processes. The first part of this study is focused on retrospective simulations of seventeen storms that affected the region in the period 2004-2013. Optimal variances are estimated by minimizing the root mean square error and are applied to out-of-sample weather events. The applicability and usefulness of this approach are demonstrated by conducting an error analysis based on in-situ observations from meteorological stations of the National Weather Service (NWS) for wind speed and wind direction, and NCEP Stage IV radar data, mosaicked from the regional multi-sensor for precipitation. The preliminary results indicate a significant improvement in the statistical metrics of the modeled-observed pairs for meteorological variables using various combinations of the sixteen events as predictors of the seventeenth. This presentation will illustrate the implemented methodology and the obtained results for wind speed, wind direction and precipitation, as well as set the research steps that will be
Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

Science.gov (United States)

Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

2017-10-01

Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Predicting thyroxine requirements following total thyroidectomy.

Science.gov (United States)

Mistry, Dipan; Atkin, Stephen; Atkinson, Helen; Gunasekaran, Sinnappa; Sylvester, Deborah; Rigby, Alan S; England, R James

2011-03-01

Optimal thyroxine replacement following total thyroidectomy is critical to avoid symptoms of hypothyroidism. The aim of this study was to determine the best formula to determine the initiated replacement dose of levothyroxine immediately following total thyroidectomy. Prospective study. All patients were initiated on 100 μg levothyroxine and titrated to within the reference range for TSH and free T4. Correlations to height, weight, age, lean body mass (LBM), body surface area (BSA) and body mass index (BMI) were calculated. One hundred consecutive adult patients underwent total thyroidectomy for non-malignant disease. Comparison between three methods of levothyroxine dose prediction, aiming for a levothyroxine dose correct to within 25 μg of actual dose required. Correlations were seen between levothyroxine dose and patient age (r=-0.346, Pregression equation was calculated (predicted levothyroxine dose=[0·943 × bodyweight] + [-1.165 × age] + 125.8), simplified to (levothyroxine dose= bodyweight - age + 125) pragmatically. Initiating patients empirically on 100 μg post-operatively showed that 40% of patients achieved target within 25 μg of their required dose; this increased to 59% when using a weight-only dose calculation (1.6 μg/kg) and to 72% using the simplified regression equation. A simple calculated regression equation gives a more accurate prediction of initiated levothyroxine dose following total thyroidectomy, reducing the need for outpatient attendance for dose titration. © 2011 Blackwell Publishing Ltd.
Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units

International Nuclear Information System (INIS)

Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien

2015-01-01

Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables
Generating linear regression model to predict motor functions by use of laser range finder during TUG.

Science.gov (United States)

Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki

2017-05-01

The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Predicting Dyspnea Inducers by Molecular Topology

Directory of Open Access Journals (Sweden)

María Gálvez-Llompart

2013-01-01

Full Text Available QSAR based on molecular topology (MT is an excellent methodology used in predicting physicochemical and biological properties of compounds. This approach is applied here for the development of a mathematical model capable to recognize drugs showing dyspnea as a side effect. Using linear discriminant analysis, it was found a four-variable regression equations enabling a predictive rate of about 81% and 73% in the training and test sets of compounds, respectively. These results demonstrate that QSAR-MT is an efficient tool to predict the appearance of dyspnea associated with drug consumption.
Predicting risk for portal vein thrombosis in acute pancreatitis patients: A comparison of radical basis function artificial neural network and logistic regression models.

Science.gov (United States)

Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei

2017-06-01

To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (Plogistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Predictive densities for day-ahead electricity prices using time-adaptive quantile regression

DEFF Research Database (Denmark)

Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik

2014-01-01

A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model...
Prediction of the temperature of the atmosphere of the primary containment: comparison between neural networks and polynomial regression

International Nuclear Information System (INIS)

Alvarez Huerta, A.; Gonzalez Miguelez, R.; Garcia Metola, D.; Noriega Gonzalez, A.

2011-01-01

The modelization is carried out through two different techniques, a conventional polynomial regression and other based on an approach by neural networks artificial. He is a comparison between the quality of the forecast would make different models based on the polynomial regression and neural network with generalization by Bayesian regulation, using the indicators of the root of the mean square error and the coefficient of determination, in view of the results, the neural network generates a prediction more accurate and reliable than the polynomial regression.
Measurement and prediction of dabigatran etexilate mesylate Form II solubility in mono-solvents and mixed solvents

International Nuclear Information System (INIS)

Xiao, Yan; Wang, Jingkang; Wang, Ting; Ouyang, Jinbo; Huang, Xin; Hao, Hongxun; Bao, Ying; Fang, Wen; Yin, Qiuxiang

2016-01-01

Highlights: • Solubility of DEM Form II in mono-solvents and binary solvent mixtures was measured. • Regressed UNIFAC model was used to predict the solubility in solvent mixtures. • The experimental solubility data were correlated by different models. - Abstract: UV spectrometer method was used to measure the solubility data of dabigatran etexilate mesylate (DEM) Form II in five mono-solvents (methanol, ethanol, ethane-1,2-diol, DMF, DMAC) and binary solvent mixtures of methanol and ethanol in the temperature range from 287.37 K to 323.39 K. The experimental solubility data in mono-solvents were correlated with modified Apelblat equation, van’t Hoff equation and λh equation. GSM model and Modified Jouyban-Acree model were employed to correlate the solubility data in mixed solvent systems. And Regressed UNIFAC model was used to predict the solubility of DEM Form II in the binary solvent mixtures. Results showed that the predicted data were consistent with the experimental data.
Forecasting with Dynamic Regression Models

CERN Document Server

Pankratz, Alan

2012-01-01

One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
FUZZY REGRESSION MODEL TO PREDICT THE BEAD GEOMETRY IN THE ROBOTIC WELDING PROCESS

Institute of Scientific and Technical Information of China (English)

B.S. Sung; I.S. Kim; Y. Xue; H.H. Kim; Y.H. Cha

2007-01-01

Recently, there has been a rapid development in computer technology, which has in turn led todevelop the fully robotic welding system using artificial intelligence (AI) technology. However, therobotic welding system has not been achieved due to difficulties of the mathematical model andsensor technologies. The possibilities of the fuzzy regression method to predict the bead geometry,such as bead width, bead height, bead penetration and bead area in the robotic GMA (gas metalarc) welding process is presented. The approach, a well-known method to deal with the problemswith a high degree of fuzziness, is used to build the relationship between four process variablesand the four quality characteristics, respectively. Using these models, the proper prediction of theprocess variables for obtaining the optimal bead geometry can be determined.

Creation of a predictive equation to estimate fat-free mass and the ratio of fat-free mass to skeletal size using morphometry in lean working farm dogs.

Science.gov (United States)

Leung, Y M; Cave, N J; Hodgson, B A S

2018-06-27

To develop an equation that accurately estimates fat-free mass (FFM) and the ratio of FFM to skeletal size or mass, using morphometric measurements in lean working farm dogs, and to examine the association between FFM derived from body condition score (BCS) and FFM measured using isotope dilution. Thirteen Huntaway and seven Heading working dogs from sheep and beef farms in the Waikato region of New Zealand were recruited based on BCS (BCS 4) using a nine-point scale. Bodyweight, BCS, and morphometric measurements (head length and circumference, body length, thoracic girth, and fore and hind limb length) were recorded for each dog, and body composition was measured using an isotopic dilution technique. A new variable using morphometric measurements, termed skeletal size, was created using principal component analysis. Models for predicting FFM, leanST (FFM minus skeletal mass) and ratios of FFM and leanST to skeletal size or mass were generated using multiple linear regression analysis. Mean FFM of the 20 dogs, measured by isotope dilution, was 22.1 (SD 4.4) kg and the percentage FFM of bodyweight was 87.0 (SD 5.0)%. Median BCS was 3.0 (min 1, max 6). Bodyweight, breed, age and skeletal size or mass were associated with measured FFM (pFFM and measured FFM (R 2 =0.96), and for the ratio of predicted FFM to skeletal size and measured values (R 2 =0.99). Correlation coefficients were higher for the ratio FFM and leanST to skeletal size than for ratios using skeletal mass. There was a positive correlation between BCS-derived fat mass as a percentage of bodyweight and fat mass percentage determined using isotope dilution (R 2 =0.65). As expected, the predictive equation was accurate in estimating FFM when tested on the same group of dogs used to develop the equation. The significance of breed, independent of skeletal size, in predicting FFM indicates that individual breed formulae may be required. Future studies that apply these equations on a greater population of
[Logistic regression model of noninvasive prediction for portal hypertensive gastropathy in patients with hepatitis B associated cirrhosis].

Science.gov (United States)

Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo

2015-05-12

To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Development of equations for predicting Puerto Rican subtropical dry forest biomass and volume

Science.gov (United States)

Thomas J. Brandeis; Matthew Delaney; Bernard R. Parresol; Larry Royer

2006-01-01

Carbon accounting, forest health monitoring and sustainable management of the subtropical dry forests of Puerto Rico and other Caribbean Islands require an accurate assessment of forest aboveground biomass (AGB) and stem volume. One means of improving assessment accuracy is the development of predictive equations derived from locally collected data. Forest inventory...
An Application to the Prediction of LOD Change Based on General Regression Neural Network

Science.gov (United States)

Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.

2011-07-01

Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Effectiveness of prediction equations in estimating energy expenditure sample of Brazilian and Spanish women with excess body weight

OpenAIRE

Lopes Rosado, Eliane; Santiago de Brito, Roberta; Bressan, Josefina; Martínez Hernández, José Alfredo

2014-01-01

Objective: To assess the adequacy of predictive equations for estimation of energy expenditure (EE), compared with the EE using indirect calorimetry in a sample of Brazilian and Spanish women with excess body weight Methods: It is a cross-sectional study with 92 obese adult women [26 Brazilian -G1- and 66 Spanish - G2- (aged 20-50)]. Weight and height were evaluated during fasting for the calculation of body mass index and predictive equations. EE was evaluated using the open-circuit indirect...
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Science.gov (United States)

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Prediction of Agriculture Drought Using Support Vector Regression Incorporating with Climatology Indices

Science.gov (United States)

Tian, Y.; Xu, Y. P.

2017-12-01

In this paper, the Support Vector Regression (SVR) model incorporating climate indices and drought indices are developed to predict agriculture drought in Xiangjiang River basin, Central China. The agriculture droughts are presented with the Precipitation-Evapotranspiration Index (SPEI). According to the analysis of the relationship between SPEI with different time scales and soil moisture, it is found that SPEI of six months time scales (SPEI-6) could reflect the soil moisture better than that of three and one month time scale from the drought features including drought duration, severity and peak. Climate forcing like El Niño Southern Oscillation and western Pacific subtropical high (WPSH) are represented by climate indices such as MEI and series indices of WPSH. Ridge Point of WPSH is found to be the key factor that influences the agriculture drought mainly through the control of temperature. Based on the climate indices analysis, the predictions of SPEI-6 are conducted using the SVR model. The results show that the SVR model incorperating climate indices, especially ridge point of WPSH, could improve the prediction accuracy compared to that using drought index only. The improvement was more significant for the prediction of one month lead time than that of three months lead time. However, it needs to be cautious in selection of the input parameters, since adding more useless information could have a counter effect in attaining a better prediction.
Development and validation of risk prediction equations to estimate survival in patients with colorectal cancer: cohort study

OpenAIRE

Hippisley-Cox, Julia; Coupland, Carol

2017-01-01

Objective: To develop and externally validate risk prediction equations to estimate absolute and conditional survival in patients with colorectal cancer. \\ud \\ud Design: Cohort study.\\ud \\ud Setting: General practices in England providing data for the QResearch database linked to the national cancer registry.\\ud \\ud Participants: 44 145 patients aged 15-99 with colorectal cancer from 947 practices to derive the equations. The equations were validated in 15 214 patients with colorectal cancer ...
Prediction of cardiorespiratory fitness from self-reported data in elderly

Directory of Open Access Journals (Sweden)

Geraldo A Maranhao Neto

2017-12-01

Full Text Available Cardiorespiratory fitness (CRF is associated with several health outcomes. Some non-exercise equations are available for CRF estimation. However, little is known about the validation of these equations among elderly. The aim of this study was to exam the validity of non-exercise equations with self-reported information in elderly. Participants (n= 93 aged 60 to 91 years measured CRF using maximal cardiopulmonary exercise test. Five non-exercise equations were selected. Data included in the equations (age, sex, weight, height, body mass index, physical activity and smoking were self-reported. Coefficient of determination (R2 of linear regressions with laboratory-measured VO2 peak ranged from 0.04 to 0.64. The Bland-Altman plots showed higher agreement between achieved and predicted CRF obtained by Jackson and colleagues, and Wier and colleagues equations. On the other hand, the other equations showed lower agreement and overestimation. Our findings provide evidences that two non-exercise equations, previously developed, could be used on the prediction of CRF among elderly.
A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

Science.gov (United States)

Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

2006-08-01

A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

Science.gov (United States)

Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

2010-01-01

The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Constructing and predicting solitary pattern solutions for nonlinear time-fractional dispersive partial differential equations

Science.gov (United States)

Arqub, Omar Abu; El-Ajou, Ahmad; Momani, Shaher

2015-07-01

Building fractional mathematical models for specific phenomena and developing numerical or analytical solutions for these fractional mathematical models are crucial issues in mathematics, physics, and engineering. In this work, a new analytical technique for constructing and predicting solitary pattern solutions of time-fractional dispersive partial differential equations is proposed based on the generalized Taylor series formula and residual error function. The new approach provides solutions in the form of a rapidly convergent series with easily computable components using symbolic computation software. For method evaluation and validation, the proposed technique was applied to three different models and compared with some of the well-known methods. The resultant simulations clearly demonstrate the superiority and potentiality of the proposed technique in terms of the quality performance and accuracy of substructure preservation in the construct, as well as the prediction of solitary pattern solutions for time-fractional dispersive partial differential equations.
Age prediction formulae from radiographic assessment of skeletal maturation at the knee in an Irish population.

LENUS (Irish Health Repository)

O'Connor, Jean E

2014-01-01

Age estimation in living subjects is primarily achieved through assessment of a hand-wrist radiograph and comparison with a standard reference atlas. Recently, maturation of other regions of the skeleton has also been assessed in an attempt to refine the age estimates. The current study presents a method to predict bone age directly from the knee in a modern Irish sample. Ten maturity indicators (A-J) at the knee were examined from radiographs of 221 subjects (137 males; 84 females). Each indicator was assigned a maturity score. Scores for indicators A-G, H-J and A-J, respectively, were totalled to provide a cumulative maturity score for change in morphology of the epiphyses (AG), epiphyseal union (HJ) and the combination of both (AJ). Linear regression equations to predict age from the maturity scores (AG, HJ, AJ) were constructed for males and females. For males, equation-AJ demonstrated the greatest predictive capability (R(2)=0.775) while for females equation-HJ had the strongest capacity for prediction (R(2)=0.815). When equation-AJ for males and equation-HJ for females were applied to the current sample, the predicted age of 90% of subjects was within ±1.5 years of actual age for male subjects and within +2.0 to -1.9 years of actual age for female subjects. The regression formulae and associated charts represent the most contemporary method of age prediction currently available for an Irish population, and provide a further technique which can contribute to a multifactorial approach to age estimation in non-adults.
Predicting Factors of INSURE Failure in Low Birth Weight Neonates with RDS; A Logistic Regression Model

Directory of Open Access Journals (Sweden)

Bita Najafian

2015-02-01

Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS.Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version.Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method.Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Predicting Factors of INSURE Failure in Low Birth Weight Neonates with RDS; A Logistic Regression Model

Directory of Open Access Journals (Sweden)

Bita Najafian

2015-02-01

Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS. Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version. Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method. Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Watershed regressions for pesticides (WARP) for predicting atrazine concentration in Corn Belt streams

Science.gov (United States)

Stone, Wesley W.; Gilliom, Robert J.

2011-01-01

Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, can be improved for application to the U.S. Corn Belt region by developing region-specific models that include important watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for predicting annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. All streams used in development of WARP-CB models drain watersheds with atrazine use intensity greater than 17 kilograms per square kilometer (kg/km2). The WARP-CB models accounted for 53 to 62 percent of the variability in the various concentration statistics among the model-development sites.
Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

Science.gov (United States)

Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

2018-02-12

Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.
Maximum solid concentrations of coal water slurries predicted by neural network models

Energy Technology Data Exchange (ETDEWEB)

Cheng, Jun; Li, Yanchang; Zhou, Junhu; Liu, Jianzhong; Cen, Kefa

2010-12-15

The nonlinear back-propagation (BP) neural network models were developed to predict the maximum solid concentration of coal water slurry (CWS) which is a substitute for oil fuel, based on physicochemical properties of 37 typical Chinese coals. The Levenberg-Marquardt algorithm was used to train five BP neural network models with different input factors. The data pretreatment method, learning rate and hidden neuron number were optimized by training models. It is found that the Hardgrove grindability index (HGI), moisture and coalification degree of parent coal are 3 indispensable factors for the prediction of CWS maximum solid concentration. Each BP neural network model gives a more accurate prediction result than the traditional polynomial regression equation. The BP neural network model with 3 input factors of HGI, moisture and oxygen/carbon ratio gives the smallest mean absolute error of 0.40%, which is much lower than that of 1.15% given by the traditional polynomial regression equation. (author)
U.S. Army Armament Research, Development and Engineering Center Grain Evaluation Software to Numerically Predict Linear Burn Regression for Solid Propellant Grain Geometries

Science.gov (United States)

2017-10-01

ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...distribution is unlimited. AD U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER Munitions Engineering Technology Center Picatinny...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Do Dual-Route Models Accurately Predict Reading and Spelling Performance in Individuals with Acquired Alexia and Agraphia?

OpenAIRE

Rapcsak, Steven Z.; Henry, Maya L.; Teague, Sommer L.; Carnahan, Susan D.; Beeson, Pélagie M.

2007-01-01

Coltheart and colleagues (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Castles, Bates, & Coltheart, 2006) have demonstrated that an equation derived from dual-route theory accurately predicts reading performance in young normal readers and in children with reading impairment due to developmental dyslexia or stroke. In this paper we present evidence that the dual-route equation and a related multiple regression model also accurately predict both reading and spelling performance in adult...

Esophageal Stenosis Associated With Tumor Regression in Radiotherapy for Esophageal Cancer: Frequency and Prediction

Energy Technology Data Exchange (ETDEWEB)

Atsumi, Kazushige [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Shioyama, Yoshiyuki, E-mail: shioyama@radiol.med.kyushu-u.ac.jp [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Arimura, Hidetaka [Department of Health Sciences, Kyushu University, Fukuoka (Japan); Terashima, Kotaro [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Matsuki, Takaomi [Department of Health Sciences, Kyushu University, Fukuoka (Japan); Ohga, Saiji; Yoshitake, Tadamasa; Nonoshita, Takeshi; Tsurumaru, Daisuke; Ohnishi, Kayoko; Asai, Kaori; Matsumoto, Keiji [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan); Nakamura, Katsumasa [Department of Radiology, Kyushu University Hospital at Beppu, Oita (Japan); Honda, Hiroshi [Department of Clinical Radiology, Graduate School of Medical Sciences, Kyushu University, Fukuoka (Japan)

2012-04-01

Purpose: To determine clinical factors for predicting the frequency and severity of esophageal stenosis associated with tumor regression in radiotherapy for esophageal cancer. Methods and Materials: The study group consisted of 109 patients with esophageal cancer of T1-4 and Stage I-III who were treated with definitive radiotherapy and achieved a complete response of their primary lesion at Kyushu University Hospital between January 1998 and December 2007. Esophageal stenosis was evaluated using esophagographic images within 3 months after completion of radiotherapy. We investigated the correlation between esophageal stenosis after radiotherapy and each of the clinical factors with regard to tumors and therapy. For validation of the correlative factors for esophageal stenosis, an artificial neural network was used to predict the esophageal stenotic ratio. Results: Esophageal stenosis tended to be more severe and more frequent in T3-4 cases than in T1-2 cases. Esophageal stenosis in cases with full circumference involvement tended to be more severe and more frequent than that in cases without full circumference involvement. Increases in wall thickness tended to be associated with increases in esophageal stenosis severity and frequency. In the multivariate analysis, T stage, extent of involved circumference, and wall thickness of the tumor region were significantly correlated to esophageal stenosis (p = 0.031, p < 0.0001, and p = 0.0011, respectively). The esophageal stenotic ratio predicted by the artificial neural network, which learned these three factors, was significantly correlated to the actual observed stenotic ratio, with a correlation coefficient of 0.864 (p < 0.001). Conclusion: Our study suggested that T stage, extent of involved circumference, and esophageal wall thickness of the tumor region were useful to predict the frequency and severity of esophageal stenosis associated with tumor regression in radiotherapy for esophageal cancer.
10 km running performance predicted by a multiple linear regression model with allometrically adjusted variables.

Science.gov (United States)

Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O

2016-06-01

The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

Science.gov (United States)

Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

2017-06-14

Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

Science.gov (United States)

Yu, Wenbao; Park, Taesung

2014-01-01

It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

Directory of Open Access Journals (Sweden)

Avval Zhila Mohajeri

2015-01-01

Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

Science.gov (United States)

This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)
General Nature of Multicollinearity in Multiple Regression Analysis.

Science.gov (United States)

Liu, Richard

1981-01-01

Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Prediction of canine and premolar size using the widths of various permanent teeth combinations: A cross-sectional study

Directory of Open Access Journals (Sweden)

Kalasandhya Vanjari

2015-01-01

Full Text Available Aims: To suggest the best predictor/s for determining the mesio-distal widths (MDWs of canines (C and premolars (Ps, and propose regression equation/s for hitherto unreported population. Methods: Impressions of maxillary and mandibular arches were made for 201 children (100 boys and 101 girls; age range: 11–15 years who met the inclusion criteria and poured with dental stone. The maximum MDWs of all the permanent teeth were measured using digital vernier caliper. Thirty-three possible combinations (patterns of permanent maxillary and mandibular first molars, central and lateral incisors were framed and correlated with MDWs of C and Ps using Pearson correlation test. Results: There were significant correlations between the considered patterns and MDWs of C and Ps, with difference noted between girls (range of r: 0.34–0.66 and boys (range of r: 0.28–0.77. Simple linear and multiple regression equations for boys, girls, and combined sample were determined to predict MDW of C and Ps in both the arches. Conclusions: The accuracy of prediction improved considerably with the inclusion of as many teeth as possible in the regression equations. The newly proposed equations based on the erupted teeth may be considered clinically useful for space analysis in the considered population.
A Predictive Model for Microbial Counts on Beaches where Intertidal Sand is the Primary Source

Science.gov (United States)

Feng, Zhixuan; Reniers, Ad; Haus, Brian K.; Solo-Gabriele, Helena M.; Wang, John D.; Fleming, Lora E.

2015-01-01

Human health protection at recreational beaches requires accurate and timely information on microbiological conditions to issue advisories. The objective of this study was to develop a new numerical mass balance model for enterococci levels on nonpoint source beaches. The significant advantage of this model is its easy implementation, and it provides a detailed description of the cross-shore distribution of enterococci that is useful for beach management purposes. The performance of the balance model was evaluated by comparing predicted exceedances of a beach advisory threshold value to field data, and to a traditional regression model. Both the balance model and regression equation predicted approximately 70% the advisories correctly at the knee depth and over 90% at the waist depth. The balance model has the advantage over the regression equation in its ability to simulate spatiotemporal variations of microbial levels, and it is recommended for making more informed management decisions. PMID:25840869
A temperature rise equation for predicting environmental impact and performance of cooling ponds

Energy Technology Data Exchange (ETDEWEB)

Serag-Eldin, M.A. [American Univ. in Cairo, Cairo (Egypt). Dept. of Mechanical Engineering

2009-07-01

Cooling ponds are used to cool the condenser water used in large central air-conditioning systems. However, larger cooling loads can often increase pond surface evaporation rates. A temperature-rise energy equation was developed to predict temperature rises in cooling ponds subjected to heating loads. The equation was designed to reduce the need for detailed meteorological data as well as to determine the required surface area and depth of the pond for any given design criteria. Energy equations in the presence and absence of cooling loads were subtracted from each other to determine increases in pond temperature resulting from the cooling load. The energy equations include solar radiation, radiation exchange with sky and surroundings, heat convection from the surface, evaporative cooling, heat conducted to the walls, and rate of change of water temperature. Results of the study suggested that the environmental impact and performance of the cooling pond is a function of temperature only. It was concluded that with the aid of the calculated flow field and temperature distribution, the method can be used to position sprays in order to produce near-uniform pond temperatures. 10 refs., 12 figs.
Prediction of width of un-erupted incisors, canines and premolars in a Ugandan population: A cross sectional study

Directory of Open Access Journals (Sweden)

Buwembo William

2012-07-01

Full Text Available Abstract Background Accurate prediction of the space forms an important part of an orthodontic assessment in the mixed dentition. However the most commonly used methods of space analysis are based on data developed on Caucasian populations. In order to provide more accurate local data we set out to develop a formula for predicting the widths of un-erupted canines and premolars for a Ugandan population and to compare the predicted widths of the teeth from this formula with those obtained from Moyers’ tables, and Tanaka and Johnston’s equations. Methods Dental casts were prepared using mandibular and maxillary arch impressions of 220 children (85 boys/135 girls aged 12–17 years recruited from schools in Kampala, Uganda. The mesio-distal width of the mandibular incisors, mandibular and maxillary canines and premolars were measured with a pair of digital calipers. Based on regression analysis, predictive equations were derived and the findings were compared with those presented in Moyers’ probability tables, and Tanaka and Johnston’s equations. Results There were no statistically significant differences between the tooth widths predicted by our equations and those from Moyers’ probability tables at the 65th and 75th percentile probabilities for the girls and at 75th level in boys in the mandibular arch. While in the maxillary arch no statistically significant differences at the 75th and 95th levels were noted in girls. There were statistically significant differences between predicted tooth sizes using equations from the present study and those predicted from the Tanaka and Johnston regression equations. Conclusions In this Ugandan population, Moyers’ probability tables could be used to predict tooth widths at specific percentile probabilities, but generally, Tanaka and Johnston technique tends to overestimate the tooth widths.
Scaling model for prediction of radionuclide activity in cooling water using a regression triplet technique

International Nuclear Information System (INIS)

Silvia Dulanska; Lubomir Matel; Milan Meloun

2010-01-01

The decommissioning of the nuclear power plant (NPP) A1 Jaslovske Bohunice (Slovakia) is a complicated set of problems that is highly demanding both technically and financially. The basic goal of the decommissioning process is the total elimination of radioactive materials from the nuclear power plant area, and radwaste treatment to a form suitable for its safe disposal. The initial conditions of decommissioning also include elimination of the operational events, preparation and transport of the fuel from the plant territory, radiochemical and physical-chemical characterization of the radioactive wastes. One of the problems was and still is the processing of the liquid radioactive wastes. Such media is also the cooling water of the long-term storage of spent fuel. A suitable scaling model for predicting the activity of hard-to-detect radionuclides 239,240 Pu, 90 Sr and summary beta in cooling water using a regression triplet technique has been built using the regression triplet analysis and regression diagnostics. (author)
The calculated reference value of the tubular extraction rate in infants and children. An attempt to use a new regression equation

International Nuclear Information System (INIS)

Watanabe, Nami; Sugai Yukio; Komatani, Akio; Yamaguchi, Koichi; Takahashi, Kazuei

1999-01-01

This study was designed to investigate the empirical tubular extraction rate (TER) of the normal renal function in childhood and then propose a new equation to obtain TER theoretically. The empirical TER was calculated using Russell's method for determination of single-sample plasma clearance and 99m Tc-MAG 3 in 40 patients with renal disease younger than 10 years of age who were classified as having normal renal function using diagnostic criteria defined by the Paediatric Task Group of EANM. First, we investigated the relationships of the empirical value of absolute TER to age, body weight, body surface area (BSA) and distribution volume. Next we investigated the relationships of the empirical value of BSA corrected TER to age, body weight, BSA and distribution volume. Linear relationship was indicated between the absolute TER and each body dimensional factors, especially regarding to BSA, its correlation coefficient was 0.90 (p value). The BSA-corrected TER showed a logarithmic relationship with BSA, but linear regression did not show any significant correlation. Therefore, it was thought that the normal value of TER could be calculated theoretically using the body surface area, and here we proposed the following linear regression equation; Theoretical TER (ml/min/1.73 m 2 )=(-39.8+257.2 x BSA)/BSA/1.73. The theoretical TER could be one of the reference values of the renal function in the period of the renal maturation. (author)
Failure and reliability prediction by support vector machines regression of time series data

International Nuclear Information System (INIS)

Chagas Moura, Marcio das; Zio, Enrico; Lins, Isis Didier; Droguett, Enrique

2011-01-01

Support Vector Machines (SVMs) are kernel-based learning methods, which have been successfully adopted for regression problems. However, their use in reliability applications has not been widely explored. In this paper, a comparative analysis is presented in order to evaluate the SVM effectiveness in forecasting time-to-failure and reliability of engineered components based on time series data. The performance on literature case studies of SVM regression is measured against other advanced learning methods such as the Radial Basis Function, the traditional MultiLayer Perceptron model, Box-Jenkins autoregressive-integrated-moving average and the Infinite Impulse Response Locally Recurrent Neural Networks. The comparison shows that in the analyzed cases, SVM outperforms or is comparable to other techniques. - Highlights: → Realistic modeling of reliability demands complex mathematical formulations. → SVM is proper when the relation input/output is unknown or very costly to be obtained. → Results indicate the potential of SVM for reliability time series prediction. → Reliability estimates support the establishment of adequate maintenance strategies.
Data-driven discovery of partial differential equations.

Science.gov (United States)

Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

2017-04-01

We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

Science.gov (United States)

Lian, Yao; Ge, Meng; Pan, Xian-Ming

2014-12-19

B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .
Ground Motion Prediction Equations for Western Saudi Arabia from a Reference Model

Science.gov (United States)

Kiuchi, R.; Mooney, W. D.; Mori, J. J.; Zahran, H. M.; Al-Raddadi, W.; Youssef, S.

2017-12-01

Western Saudi Arabia is surrounded by several active seismic zones such as the Red Sea and the Gulf of Aqaba where a destructive magnitude 7.3 event occurred in 1995. Over the last decade, the Saudi Geological Survey (SGS) has deployed a dense seismic network that has made it possible to monitor seismic activity more accurately. For example, the network has detected multiple seismic swarms beneath the volcanic fields in western Saudi Arabia. The most recent damaging event was a M5.7 earthquake that occurred in 2009 at Harrat Lunayyir. In terms of seismic hazard assessment, Zahran et al. (2015; 2016) presented a Probabilistic Seismic Hazard Assessment (PSHA) for western Saudi Arabia that was developed using published Ground Motion Prediction Equations (GMPEs) from areas outside of Saudi Arabia. In this study, we consider 41 earthquakes of M 3.0 - 5.4, recorded on 124 stations of the SGS network, to create a set of 442 peak ground acceleration (PGA) and peak ground velocity (PGV) records with a range of epicentral distances from 3 km to 400 km. We use the GMPE model BSSA14 (Boore et al., 2014) as a reference model to estimate our own best-fitting coefficients from a regression analysis using the events occurred in western Saudi Arabia. For epicentral distances less than 100 km, our best fitting model has different source scaling in comparison with the GMPE of BSSA14 adjusted for the California region. In addition, our model indicates that the peak amplitudes have less attenuation in western Saudi Arabia than in California.
Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations

Directory of Open Access Journals (Sweden)

Pascal Schopp

2017-11-01

Full Text Available A major application of genomic prediction (GP in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs, experimental studies found substantial variation in prediction accuracy (PA, but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize (Zea mays L. as parents to generate in silico 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP models trained with individuals from either full-sib (FSF, half-sib (HSF, or unrelated families (URF for various sizes (Ntrain of the training set and different heritabilities (h2 . In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41–0.97 with large variation only for Ntrain < 50 and h2 < 0.6. For HSF and URF, PA was on average ∼40–60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.
Human chorionic gonadotrophin regression rate as a predictive factor of postmolar gestational trophoblastic neoplasm in high-risk hydatidiform mole: a case-control study.

Science.gov (United States)

Kim, Bo Wook; Cho, Hanbyoul; Kim, Hyunki; Nam, Eun Ji; Kim, Sang Wun; Kim, Sunghoon; Kim, Young Tae; Kim, Jae-Hoon

2012-01-01

The aim of this study was early prediction of postmolar gestational trophoblastic neoplasm (GTN) after evacuation of high-risk mole, by comparison of human chorionic gonadotrophin (hCG) regression rates. Fifty patients with a high-risk mole initially and spontaneously regressing after molar evacuation were selected from January 1, 1996 to May 31, 2010 (spontaneous regression group). Fifty patients with a high-risk mole initially and progressing to postmolar GTN after molar evacuation were selected (postmolar GTN group). hCG regression rates represented as hCG/initial hCG were compared between the two groups. The sensitivity and specificity of these rates for prediction of postmolar GTN were assessed using receiver operating characteristic curves. Multivariate analyses of associations between risk factors and postmolar GTN progression were performed. The mean regression rate of hCG between the two groups was compared. hCG regression rates represented as hCG/initial hCG (%) were 0.36% in the spontaneous regression group and 1.45% in the postmolar GTN group in the second week (p=0.003). Prediction of postmolar GTN by hCG regression rate revealed a sensitivity of 48.0% and specificity of 89.5% with a cut-off value of 0.716% and area under the curve (AUC) of 0.759 in the 2nd week (pfactor for postmolar GTN. Crown Copyright © 2011. Published by Elsevier Ireland Ltd. All rights reserved.
Concordance between hypoxic challenge testing and predictive equations for hypoxic flight assessment in chronic obstructive pulmonary disease patients prior to air travel

Directory of Open Access Journals (Sweden)

Mohie Aldeen Abd Alzaher Khalifa

2016-10-01

Conclusions: The present study supports on-HCT as a reliable, on-invasive and continuous methods determining the requirement for in-flight O2 are relatively constant. Predictive equations considerably overestimate the need for in-flight O2 compared to hypoxic inhalation test. Predictive equations are cheap, readily available methods of flight assessment, but this study shows poor agreement between their predictions and the measured individual hypoxic responses during HCT.

Cull sow knife-separable lean content evaluation at harvest and lean mass content prediction equation development.

Science.gov (United States)

Abell, Caitlyn E; Stalder, Kenneth J; Hendricks, Haven B; Fitzgerald, Robert F

2012-07-01

The objectives of this study were to develop a prediction equation for carcass knife-separable lean within and across USDA cull sow market weight classes (MWC) and to determine carcass and individual primal cut knife separable lean content from cull sows. There were significant percent lean and fat differences in the primal cuts across USDA MWC. The two lighter USDA MWC had a greater percent carcass lean and lower percent fat compared to the two heavier MWC. In general, hot carcass weight explained the majority of carcass lean variation. Additionally, backfat was a significant variation source when predicting cull sow carcass lean. The findings support using a single lean prediction equation across MWC to assist processors when making cull sow purchasing decisions and determine the mix of animals from various USDA MWC that will meet their needs when making pork products with defined lean:fat content. Copyright © 2012 Elsevier Ltd. All rights reserved.
Earthquake prediction in California using regression algorithms and cloud-based big data infrastructure

Science.gov (United States)

Asencio-Cortés, G.; Morales-Esteban, A.; Shang, X.; Martínez-Álvarez, F.

2018-06-01

Earthquake magnitude prediction is a challenging problem that has been widely studied during the last decades. Statistical, geophysical and machine learning approaches can be found in literature, with no particularly satisfactory results. In recent years, powerful computational techniques to analyze big data have emerged, making possible the analysis of massive datasets. These new methods make use of physical resources like cloud based architectures. California is known for being one of the regions with highest seismic activity in the world and many data are available. In this work, the use of several regression algorithms combined with ensemble learning is explored in the context of big data (1 GB catalog is used), in order to predict earthquakes magnitude within the next seven days. Apache Spark framework, H2 O library in R language and Amazon cloud infrastructure were been used, reporting very promising results.
Exploring the predictive power of interaction terms in a sophisticated risk equalization model using regression trees.

Science.gov (United States)

van Veen, S H C M; van Kleef, R C; van de Ven, W P M M; van Vliet, R C J A

2018-02-01

This study explores the predictive power of interaction terms between the risk adjusters in the Dutch risk equalization (RE) model of 2014. Due to the sophistication of this RE-model and the complexity of the associations in the dataset (N = ~16.7 million), there are theoretically more than a million interaction terms. We used regression tree modelling, which has been applied rarely within the field of RE, to identify interaction terms that statistically significantly explain variation in observed expenses that is not already explained by the risk adjusters in this RE-model. The interaction terms identified were used as additional risk adjusters in the RE-model. We found evidence that interaction terms can improve the prediction of expenses overall and for specific groups in the population. However, the prediction of expenses for some other selective groups may deteriorate. Thus, interactions can reduce financial incentives for risk selection for some groups but may increase them for others. Furthermore, because regression trees are not robust, additional criteria are needed to decide which interaction terms should be used in practice. These criteria could be the right incentive structure for risk selection and efficiency or the opinion of medical experts. Copyright © 2017 John Wiley & Sons, Ltd.
Equations relating compacted and uncompacted live crown ratio for common tree species in the South

Science.gov (United States)

KaDonna C. Randolph

2010-01-01

Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...
Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model.

Science.gov (United States)

Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan

2016-10-01

Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data.

Science.gov (United States)

Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva

2017-03-01

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.
Regression Analysis and Calibration Recommendations for the Characterization of Balance Temperature Effects

Science.gov (United States)

Ulbrich, N.; Volden, T.

2018-01-01

Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
TEMPERATURE PREDICTION IN 3013 CONTAINERS IN K AREA MATERIAL STORAGE (KAMS) FACILITY USING REGRESSION METHODS

International Nuclear Information System (INIS)

Gupta, N

2008-01-01

3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO 2 density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures
Comparison of stochastic and regression based methods for quantification of predictive uncertainty of model-simulated wellhead protection zones in heterogeneous aquifers

DEFF Research Database (Denmark)

Christensen, Steen; Moore, C.; Doherty, J.

2006-01-01

accurate and required a few hundred model calls to be computed. (b) The linearized regression-based interval (Cooley, 2004) required just over a hundred model calls and also appeared to be nearly correct. (c) The calibration-constrained Monte-Carlo interval (Doherty, 2003) was found to be narrower than......For a synthetic case we computed three types of individual prediction intervals for the location of the aquifer entry point of a particle that moves through a heterogeneous aquifer and ends up in a pumping well. (a) The nonlinear regression-based interval (Cooley, 2004) was found to be nearly...... the regression-based intervals but required about half a million model calls. It is unclear whether or not this type of prediction interval is accurate....
Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study

Science.gov (United States)

Al-Anazi, A. F.; Gates, I. D.

2010-12-01

In wells with limited log and core data, porosity, a fundamental and essential property to characterize reservoirs, is challenging to estimate by conventional statistical methods from offset well log and core data in heterogeneous formations. Beyond simple regression, neural networks have been used to develop more accurate porosity correlations. Unfortunately, neural network-based correlations have limited generalization ability and global correlations for a field are usually less accurate compared to local correlations for a sub-region of the reservoir. In this paper, support vector machines are explored as an intelligent technique to correlate porosity to well log data. Recently, support vector regression (SVR), based on the statistical learning theory, have been proposed as a new intelligence technique for both prediction and classification tasks. The underlying formulation of support vector machines embodies the structural risk minimization (SRM) principle which has been shown to be superior to the traditional empirical risk minimization (ERM) principle employed by conventional neural networks and classical statistical methods. This new formulation uses margin-based loss functions to control model complexity independently of the dimensionality of the input space, and kernel functions to project the estimation problem to a higher dimensional space, which enables the solution of more complex nonlinear problem optimization methods to exist for a globally optimal solution. SRM minimizes an upper bound on the expected risk using a margin-based loss function ( ɛ-insensitivity loss function for regression) in contrast to ERM which minimizes the error on the training data. Unlike classical learning methods, SRM, indexed by margin-based loss function, can also control model complexity independent of dimensionality. The SRM inductive principle is designed for statistical estimation with finite data where the ERM inductive principle provides the optimal solution (the
Equations based on anthropometry to predict body fat measured by absorptiometry in schoolchildren and adolescents.

Science.gov (United States)

Ortiz-Hernández, Luis; Vega López, A Valeria; Ramos-Ibáñez, Norma; Cázares Lara, L Joana; Medina Gómez, R Joab; Pérez-Salgado, Diana

To develop and validate equations to estimate the percentage of body fat of children and adolescents from Mexico using anthropometric measurements. A cross-sectional study was carried out with 601 children and adolescents from Mexico aged 5-19 years. The participants were randomly divided into the following two groups: the development sample (n=398) and the validation sample (n=203). The validity of previously published equations (e.g., Slaughter) was also assessed. The percentage of body fat was estimated by dual-energy X-ray absorptiometry. The anthropometric measurements included height, sitting height, weight, waist and arm circumferences, skinfolds (triceps, biceps, subscapular, supra-iliac, and calf), and elbow and bitrochanteric breadth. Linear regression models were estimated with the percentage of body fat as the dependent variable and the anthropometric measurements as the independent variables. Equations were created based on combinations of six to nine anthropometric variables and had coefficients of determination (r 2 ) equal to or higher than 92.4% for boys and 85.8% for girls. In the validation sample, the developed equations had high r 2 values (≥85.6% in boys and ≥78.1% in girls) in all age groups, low standard errors (SE≤3.05% in boys and ≤3.52% in girls), and the intercepts were not different from the origin (p>0.050). Using the previously published equations, the coefficients of determination were lower, and/or the intercepts were different from the origin. The equations developed in this study can be used to assess the percentage of body fat of Mexican schoolchildren and adolescents, as they demonstrate greater validity and lower error compared with previously published equations. Copyright © 2017 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.
Predicting number of hospitalization days based on health insurance claims data using bagged regression trees.

Science.gov (United States)

Xie, Yang; Schreier, Günter; Chang, David C W; Neubauer, Sandra; Redmond, Stephen J; Lovell, Nigel H

2014-01-01

Healthcare administrators worldwide are striving to both lower the cost of care whilst improving the quality of care given. Therefore, better clinical and administrative decision making is needed to improve these issues. Anticipating outcomes such as number of hospitalization days could contribute to addressing this problem. In this paper, a method was developed, using large-scale health insurance claims data, to predict the number of hospitalization days in a population. We utilized a regression decision tree algorithm, along with insurance claim data from 300,000 individuals over three years, to provide predictions of number of days in hospital in the third year, based on medical admissions and claims data from the first two years. Our method performs well in the general population. For the population aged 65 years and over, the predictive model significantly improves predictions over a baseline method (predicting a constant number of days for each patient), and achieved a specificity of 70.20% and sensitivity of 75.69% in classifying these subjects into two categories of 'no hospitalization' and 'at least one day in hospital'.
An Extrapolation of a Radical Equation More Accurately Predicts Shelf Life of Frozen Biological Matrices.

Science.gov (United States)

De Vore, Karl W; Fatahi, Nadia M; Sass, John E

2016-08-01

Arrhenius modeling of analyte recovery at increased temperatures to predict long-term colder storage stability of biological raw materials, reagents, calibrators, and controls is standard practice in the diagnostics industry. Predicting subzero temperature stability using the same practice is frequently criticized but nevertheless heavily relied upon. We compared the ability to predict analyte recovery during frozen storage using 3 separate strategies: traditional accelerated studies with Arrhenius modeling, and extrapolation of recovery at 20% of shelf life using either ordinary least squares or a radical equation y = B1x(0.5) + B0. Computer simulations were performed to establish equivalence of statistical power to discern the expected changes during frozen storage or accelerated stress. This was followed by actual predictive and follow-up confirmatory testing of 12 chemistry and immunoassay analytes. Linear extrapolations tended to be the most conservative in the predicted percent recovery, reducing customer and patient risk. However, the majority of analytes followed a rate of change that slowed over time, which was fit best to a radical equation of the form y = B1x(0.5) + B0. Other evidence strongly suggested that the slowing of the rate was not due to higher-order kinetics, but to changes in the matrix during storage. Predicting shelf life of frozen products through extrapolation of early initial real-time storage analyte recovery should be considered the most accurate method. Although in this study the time required for a prediction was longer than a typical accelerated testing protocol, there are less potential sources of error, reduced costs, and a lower expenditure of resources. © 2016 American Association for Clinical Chemistry.
Determining Balıkesir’s Energy Potential Using a Regression Analysis Computer Program

Directory of Open Access Journals (Sweden)

Bedri Yüksel

2014-01-01

Full Text Available Solar power and wind energy are used concurrently during specific periods, while at other times only the more efficient is used, and hybrid systems make this possible. When establishing a hybrid system, the extent to which these two energy sources support each other needs to be taken into account. This paper is a study of the effects of wind speed, insolation levels, and the meteorological parameters of temperature and humidity on the energy potential in Balıkesir, in the Marmara region of Turkey. The relationship between the parameters was studied using a multiple linear regression method. Using a designed-for-purpose computer program, two different regression equations were derived, with wind speed being the dependent variable in the first and insolation levels in the second. The regression equations yielded accurate results. The computer program allowed for the rapid calculation of different acceptance rates. The results of the statistical analysis proved the reliability of the equations. An estimate of identified meteorological parameters and unknown parameters could be produced with a specified precision by using the regression analysis method. The regression equations also worked for the evaluation of energy potential.
Basal Metabolic Rate of Adolescent Modern Pentathlon Athletes: Agreement between Indirect Calorimetry and Predictive Equations and the Correlation with Body Parameters.

Directory of Open Access Journals (Sweden)

Luiz Lannes Loureiro

Full Text Available The accurate estimative of energy needs is crucial for an optimal physical performance among athletes and the basal metabolic rate (BMR equations often are not well adjusted for adolescent athletes requiring the use of specific methods, such as the golden standard indirect calorimetry (IC. Therefore, we had the aim to analyse the agreement between the BMR of adolescents pentathletes measured by IC and estimated by commonly used predictive equations.Twenty-eight athletes (17 males and 11 females were evaluated for BMR, using IC and the predictive equations Harris and Benedict (HB, Cunningham (CUN, Henry and Rees (HR and FAO/WHO/UNU (FAO. Body composition was obtained using DXA and sexual maturity data were retrieved through validated questionnaires. The correlations among anthropometric variables an IC were analysed by T-student test and ICC, while the agreement between IC and the predictive equations was analysed according to Bland and Altman and by survival-agreement plotting.The whole sample average BMR measured by IC was significantly different from the estimated by FAO (p<0.05. Adjusting data by gender FAO and HR equations were statistically different from IC (p <0.05 among males, while female differed only for the HR equation (p <0.05.The FAO equation underestimated athletes' BMR when compared with IC (T Test. When compared to the golden standard IC, using Bland and Altman, ICC and Survival-Agreement, the equations underestimated the energy needs of adolescent pentathlon athletes up to 300kcal/day. Therefore, they should be used with caution when estimating individual energy requirements in such populations.
Stress Regression Analysis of Asphalt Concrete Deck Pavement Based on Orthogonal Experimental Design and Interlayer Contact

Science.gov (United States)

Wang, Xuntao; Feng, Jianhu; Wang, Hu; Hong, Shidi; Zheng, Supei

2018-03-01

A three-dimensional finite element box girder bridge and its asphalt concrete deck pavement were established by ANSYS software, and the interlayer bonding condition of asphalt concrete deck pavement was assumed to be contact bonding condition. Orthogonal experimental design is used to arrange the testing plans of material parameters, and an evaluation of the effect of different material parameters in the mechanical response of asphalt concrete surface layer was conducted by multiple linear regression model and using the results from the finite element analysis. Results indicated that stress regression equations can well predict the stress of the asphalt concrete surface layer, and elastic modulus of waterproof layer has a significant influence on stress values of asphalt concrete surface layer.
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

Science.gov (United States)

Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

2012-02-01

In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
Method for calculating the variance and prediction intervals for biomass estimates obtained from allometric equations

CSIR Research Space (South Africa)

Kirton, A

2010-08-01

Full Text Available for calculating the variance and prediction intervals for biomass estimates obtained from allometric equations A KIRTON B SCHOLES S ARCHIBALD CSIR Ecosystem Processes and Dynamics, Natural Resources and the Environment P.O. BOX 395, Pretoria, 0001, South... intervals (confidence intervals for predicted values) for allometric estimates can be obtained using an example of estimating tree biomass from stem diameter. It explains how to deal with relationships which are in the power function form - a common form...
Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

Science.gov (United States)

Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

2009-11-01

Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.
Prediction of scour caused by 2D horizontal jets using soft computing techniques

Directory of Open Access Journals (Sweden)

Masoud Karbasi

2017-12-01

Full Text Available This paper presents application of five soft-computing techniques, artificial neural networks, support vector regression, gene expression programming, grouping method of data handling (GMDH neural network and adaptive-network-based fuzzy inference system, to predict maximum scour hole depth downstream of a sluice gate. The input parameters affecting the scour depth are the sediment size and its gradation, apron length, sluice gate opening, jet Froude number and the tail water depth. Six non-dimensional parameters were achieved to define a functional relationship between the input and output variables. Published data were used from the experimental researches. The results of soft-computing techniques were compared with empirical and regression based equations. The results obtained from the soft-computing techniques are superior to those of empirical and regression based equations. Comparison of soft-computing techniques showed that accuracy of the ANN model is higher than other models (RMSE = 0.869. A new GEP based equation was proposed.

Prediction of coal response to froth flotation based on coal analysis using regression and artificial neural network

Energy Technology Data Exchange (ETDEWEB)

Jorjani, E.; Poorali, H.A.; Sam, A.; Chelgani, S.C.; Mesroghli, S.; Shayestehfar, M.R. [Islam Azad University, Tehran (Iran). Dept. of Mining Engineering

2009-10-15

In this paper, the combustible value (i.e. 100-Ash) and combustible recovery of coal flotation concentrate were predicted by regression and artificial neural network based on proximate and group macerals analysis. The regression method shows that the relationships between (a) in (ash), volatile matter and moisture (b) in (ash), in (liptinite), fusinite and vitrinite with combustible value can achieve the correlation coefficients (R{sup 2}) of 0.8 and 0.79, respectively. In addition, the input sets of (c) ash, volatile matter and moisture (d) ash, liptinite and fusinite can predict the combustible recovery with the correlation coefficients of 0.84 and 0.63, respectively. Feed-forward artificial neural network with 6-8-12-11-2-1 arrangement for moisture, ash and volatile matter input set was capable to estimate both combustible value and combustible recovery with correlation of 0.95. It was shown that the proposed neural network model could accurately reproduce all the effects of proximate and group macerals analysis on coal flotation system.
Hourly predictive Levenberg-Marquardt ANN and multi linear regression models for predicting of dew point temperature

Science.gov (United States)

Zounemat-Kermani, Mohammad

2012-08-01

In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
Prediction of apparent metabolisable energy content of cereal grains and by-products for poultry from its chemical composition

Energy Technology Data Exchange (ETDEWEB)

Losada, B.; Blas, C. de; Garcia-Rebollar, P.; Cachaldora, P.; Mendez, J.; Ibañez, M.

2015-07-01

In order to predict the metabolisable energy content of ninety batches of cereal grains and cereal by-products for poultry, regression models derived from different sample aggregations and using chemical components as independent variables were compared. Several statistics have been calculated to estimate the error of prediction. The results indicate that the highest levels of significance and coefficients of determination were obtained for equations derived from the larger data sets. However, the lowest prediction errors were associated to equations calculated for data or groups of data closer to the ingredient studied. (Author)
Predicting hyperketonemia by logistic and linear regression using test-day milk and performance variables in early-lactation Holstein and Jersey cows.

Science.gov (United States)

Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M

2018-03-01

Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83
Prediction of Five Softwood Paper Properties from its Density using Support Vector Machine Regression Techniques

Directory of Open Access Journals (Sweden)

Esperanza García-Gonzalo

2016-01-01

Full Text Available Predicting paper properties based on a limited number of measured variables can be an important tool for the industry. Mathematical models were developed to predict mechanical and optical properties from the corresponding paper density for some softwood papers using support vector machine regression with the Radial Basis Function Kernel. A dataset of different properties of paper handsheets produced from pulps of pine (Pinus pinaster and P. sylvestris and cypress species (Cupressus lusitanica, C. sempervirens, and C. arizonica beaten at 1000, 4000, and 7000 revolutions was used. The results show that it is possible to obtain good models (with high coefficient of determination with two variables: the numerical variable density and the categorical variable species.
Predicting Basal Metabolic Rate in Men with Motor Complete Spinal Cord Injury.

Science.gov (United States)

Nightingale, Tom E; Gorgey, Ashraf S

2018-01-08

To assess the accuracy of existing basal metabolic rate (BMR) prediction equations in men with chronic (>1 year) spinal cord injury (SCI). The primary aim is to develop new SCI population-specific BMR prediction models, based on anthropometric, body composition and/or demographic variables that are strongly associated with BMR. Thirty men with chronic SCI (Paraplegic; n = 21, Tetraplegic; n = 9), aged 35 ± 11 years (mean ± SD) participated in this cross-sectional study. Criterion BMR values were measured by indirect calorimetry. Body composition (dual energy X-ray absorptiometry; DXA) and anthropometric measurements (circumferences and diameters) were also taken. Multiple linear regression analysis was performed to develop new SCI-specific BMR prediction models. Criterion BMR values were compared to values estimated from six existing and four developed prediction equations RESULTS: Existing equations that use information on stature, weight and/or age, significantly (P BMR by a mean of 14-17% (187-234 kcal/day). Equations that utilised fat-free mass (FFM) accurately predicted BMR. The development of new SCI-specific prediction models demonstrated that the addition of anthropometric variables (weight, height and calf circumference) to FFM (Model 3; r = 0.77), explained 8% more of the variance in BMR than FFM alone (Model 1; r = 0.69). Using anthropometric variables, without FFM, explained less of the variance in BMR (Model 4; r = 0.57). However, all the developed prediction models demonstrated acceptable mean absolute error ≤ 6%. BMR can be more accurately estimated when DXA derived FFM is incorporated into prediction equations. Utilising anthropometric measurements provides a promising alternative to improve the prediction of BMR, beyond that achieved by existing equations in persons with SCI.
There is No Quantum Regression Theorem

International Nuclear Information System (INIS)

Ford, G.W.; OConnell, R.F.

1996-01-01

The Onsager regression hypothesis states that the regression of fluctuations is governed by macroscopic equations describing the approach to equilibrium. It is here asserted that this hypothesis fails in the quantum case. This is shown first by explicit calculation for the example of quantum Brownian motion of an oscillator and then in general from the fluctuation-dissipation theorem. It is asserted that the correct generalization of the Onsager hypothesis is the fluctuation-dissipation theorem. copyright 1996 The American Physical Society
Prediction of Lunar Reconnaissance Orbiter Reaction Wheel Assembly Angular Momentum Using Regression Analysis

Science.gov (United States)

DeHart, Russell

2017-01-01

This study determines the feasibility of creating a tool that can accurately predict Lunar Reconnaissance Orbiter (LRO) reaction wheel assembly (RWA) angular momentum, weeks or even months into the future. LRO is a three-axis stabilized spacecraft that was launched on June 18, 2009. While typically nadir-pointing, LRO conducts many types of slews to enable novel science collection. Momentum unloads have historically been performed approximately once every two weeks with the goal of maintaining system total angular momentum below 70 Nms; however flight experience shows the models developed before launch are overly conservative, with many momentum unloads being performed before system angular momentum surpasses 50 Nms. A more accurate model of RWA angular momentum growth would improve momentum unload scheduling and decrease the frequency of these unloads. Since some LRO instruments must be deactivated during momentum unloads and in the case of one instrument, decontaminated for 24 hours there after a decrease in the frequency of unloads increases science collection. This study develops a new model to predict LRO RWA angular momentum. Regression analysis of data from October 2014 to October 2015 was used to develop relationships between solar beta angle, slew specifications, and RWA angular momentum growth. The resulting model predicts RWA angular momentum using input solar beta angle and mission schedule data. This model was used to predict RWA angular momentum from October 2013 to October 2014. Predictions agree well with telemetry; of the 23 momentum unloads performed from October 2013 to October 2014, the mean and median magnitude of the RWA total angular momentum prediction error at the time of the momentum unloads were 3.7 and 2.7 Nms, respectively. The magnitude of the largest RWA total angular momentum prediction error was 10.6 Nms. Development of a tool that uses the models presented herein is currently underway.
Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

Directory of Open Access Journals (Sweden)

Qiutong Jin

2016-06-01

Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.
Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments

Directory of Open Access Journals (Sweden)

Marjan Čeh

2018-05-01

Full Text Available The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008–2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1 the non-linear nature of the prediction assignment task; (2 input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3 the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R2 values, sales ratios, mean average percentage error (MAPE, coefficient of dispersion (COD revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.
N-terminal pro-B-type natriuretic peptide measurement is useful in predicting left ventricular hypertrophy regression after aortic valve replacement in patients with severe aortic stenosis.

Science.gov (United States)

Lee, Mirae; Choi, Jin-Oh; Park, Sung-Ji; Kim, Eun Young; Park, PyoWon; Oh, Jae K; Jeon, Eun-Seok

2015-01-01

The predictive factors for early left ventricular hypertrophy (LVH) regression after aortic valve replacement (AVR) have not been fully elucidated. This study was conducted to investigate which preoperative parameters predict early LVH regression after AVR. 87 consecutive patients who underwent AVR due to isolated severe aortic stenosis (AS) were analysed. Patients with ejection fraction regression of LVH at the midterm follow-up was determined. In multivariate analysis, including preoperative echocardiographic parameters, only E/e' ratio was associated with midterm LVH regression (OR 1.11, 95% CI 1.01 to 1.22; p=0.035). When preoperative NT-proBNP was added to the analysis, logNT-proBNP was found to be the single significant predictor of midterm LVH regression (OR 2.00, 95% CI 1.08 to 3.71; p=0.028). By receiver operating characteristic curve analysis, a cut-off value of 440 pg/mL for NT-proBNP yielded a sensitivity of 72% and a specificity of 77% for the prediction of LVH regression after AVR. Preoperative NT-proBNP was an independent predictor for early LVH regression after AVR in patients with isolated severe AS.
The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

International Nuclear Information System (INIS)

Rupšys, P.

2015-01-01

A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE
The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

Energy Technology Data Exchange (ETDEWEB)

Rupšys, P. [Aleksandras Stulginskis University, Studenų g. 11, Akademija, Kaunas district, LT – 53361 Lithuania (Lithuania)

2015-10-28

A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
Fourier transform infrared spectroscopic imaging and multivariate regression for prediction of proteoglycan content of articular cartilage.

Directory of Open Access Journals (Sweden)

Lassi Rieppo

Full Text Available Fourier Transform Infrared (FT-IR spectroscopic imaging has been earlier applied for the spatial estimation of the collagen and the proteoglycan (PG contents of articular cartilage (AC. However, earlier studies have been limited to the use of univariate analysis techniques. Current analysis methods lack the needed specificity for collagen and PGs. The aim of the present study was to evaluate the suitability of partial least squares regression (PLSR and principal component regression (PCR methods for the analysis of the PG content of AC. Multivariate regression models were compared with earlier used univariate methods and tested with a sample material consisting of healthy and enzymatically degraded steer AC. Chondroitinase ABC enzyme was used to increase the variation in PG content levels as compared to intact AC. Digital densitometric measurements of Safranin O-stained sections provided the reference for PG content. The results showed that multivariate regression models predict PG content of AC significantly better than earlier used absorbance spectrum (i.e. the area of carbohydrate region with or without amide I normalization or second derivative spectrum univariate parameters. Increased molecular specificity favours the use of multivariate regression models, but they require more knowledge of chemometric analysis and extended laboratory resources for gathering reference data for establishing the models. When true molecular specificity is required, the multivariate models should be used.
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.

Science.gov (United States)

Ohring, G.

1972-01-01

Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
Performance prediction of gas turbines by solving a system of non-linear equations

Energy Technology Data Exchange (ETDEWEB)

Kaikko, J

1998-09-01

This study presents a novel method for implementing the performance prediction of gas turbines from the component models. It is based on solving the non-linear set of equations that corresponds to the process equations, and the mass and energy balances for the engine. General models have been presented for determining the steady state operation of single components. Single and multiple shad arrangements have been examined with consideration also being given to heat regeneration and intercooling. Emphasis has been placed upon axial gas turbines of an industrial scale. Applying the models requires no information of the structural dimensions of the gas turbines. On comparison with the commonly applied component matching procedures, this method incorporates several advantages. The application of the models for providing results is facilitated as less attention needs to be paid to calculation sequences and routines. Solving the set of equations is based on zeroing co-ordinate functions that are directly derived from the modelling equations. Therefore, controlling the accuracy of the results is easy. This method gives more freedom for the selection of the modelling parameters since, unlike for the matching procedures, exchanging these criteria does not itself affect the algorithms. Implicit relationships between the variables are of no significance, thus increasing the freedom for the modelling equations as well. The mathematical models developed in this thesis will provide facilities to optimise the operation of any major gas turbine configuration with respect to the desired process parameters. The computational methods used in this study may also be adapted to any other modelling problems arising in industry. (orig.) 36 refs.
A gentle introduction to quantile regression for ecologists

Science.gov (United States)

Cade, B.S.; Noon, B.R.

2003-01-01

Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Field calibration and modification of scs design equation for predicting length of border under local conditions

International Nuclear Information System (INIS)

Choudhary, M.R.; Mustafa, U.S.

2009-01-01

Field tests were conducted to calibrate the existing SCS design equation in determining field border length using field data of different field lengths during 2nd and 3rd irrigations under local conditions. A single ring infiltrometer was used to estimate the water movement into and through the irrigated soil profile and in estimating the coefficients of Kostiakov infiltration function. Measurements of the unit discharge and time of advance were carried out during different irrigations on wheat irrigated fields having clay loam soil. The collected field data were used to calibrate the existing SCS design equation developed by USDA for testing its validity under local field conditions. SCS equation was modified further to improve its applicability. Results from the study revealed that the Kostiakov model over predicted the coefficients, which in turn overestimated the water advance length for boarder in the selected field using existing SCS design equation. However, the calibrated SCS design equation after parametric modification produced more satisfactory results encouraging the scientists to make its use at larger scale. (author)
Development of equations, based on milk intake, to predict starter feed intake of preweaned dairy calves.

Science.gov (United States)

Silva, A L; DeVries, T J; Tedeschi, L O; Marcondes, M I

2018-04-16

There is a lack of studies that provide models or equations capable of predicting starter feed intake (SFI) for milk-fed dairy calves. Therefore, a multi-study analysis was conducted to identify variables that influence SFI, and to develop equations to predict SFI in milk-fed dairy calves up to 64 days of age. The database was composed of individual data of 176 calves from eight experiments, totaling 6426 daily observations of intake. The information collected from the studies were: birth BW (kg), SFI (kg/day), fluid milk or milk replacer intake (MI; l/day), sex (male or female), breed (Holstein or Holstein×Gyr crossbred) and age (days). Correlations between SFI and the quantitative variables MI, birth BW, metabolic birth BW, fat intake, CP intake, metabolizable energy intake, and age were calculated. Subsequently, data were graphed, and based on a visual appraisal of the pattern of the data, an exponential function was chosen. Data were evaluated using a meta-analysis approach to estimate fixed and random effects of the experiments using nonlinear mixed coefficient statistical models. A negative correlation between SFI and MI was observed (r=-0.39), but age was positively correlated with SFI (r=0.66). No effect of liquid feed source (milk or milk replacer) was observed in developing the equation. Two equations, significantly different for all parameters, were fit to predict SFI for calves that consume less than 5 (SFI5) l/day of milk or milk replacer: ${\\rm SFI}_{{\\,\\lt\\,5}} {\\equals}0.1839_{{\\,\\pm\\,0.0581}} {\\times}{\\rm MI}{\\times}{\\rm exp}^{{\\left( {\\left( {0.0333_{{\\,\\pm\\,0.0021 }} {\\minus}0.0040_{{\\,\\pm\\,0.0011}} {\\times}{\\rm MI}} \\right){\\times}\\left( {{\\rm A}{\\minus}{\\rm }\\left( {0.8302_{{\\,\\pm\\,0.5092}} {\\plus}6.0332_{{\\,\\pm\\,0.3583}} {\\times}{\\rm MI}} \\right)} \\right)} \\right)}} {\\minus}\\left( {0.12{\\times}{\\rm MI}} \\right)$ ; ${\\rm SFI}_{{\\,\\gt\\,5}} {\\equals}0.1225_{{\\,\\pm\\,0.0005 }} {\\times
Relative performances of artificial neural network and regression mapping tools in evaluation of spinal loads and muscle forces during static lifting.

Science.gov (United States)

Arjmand, N; Ekrami, O; Shirazi-Adl, A; Plamondon, A; Parnianpour, M

2013-05-31

Two artificial neural networks (ANNs) are constructed, trained, and tested to map inputs of a complex trunk finite element (FE) model to its outputs for spinal loads and muscle forces. Five input variables (thorax flexion angle, load magnitude, its anterior and lateral positions, load handling technique, i.e., one- or two-handed static lifting) and four model outputs (L4-L5 and L5-S1 disc compression and anterior-posterior shear forces) for spinal loads and 76 model outputs (forces in individual trunk muscles) are considered. Moreover, full quadratic regression equations mapping input-outputs of the model developed here for muscle forces and previously for spine loads are used to compare the relative accuracy of these two mapping tools (ANN and regression equations). Results indicate that the ANNs are more accurate in mapping input-output relationships of the FE model (RMSE= 20.7 N for spinal loads and RMSE= 4.7 N for muscle forces) as compared to regression equations (RMSE= 120.4 N for spinal loads and RMSE=43.2 N for muscle forces). Quadratic regression equations map up to second order variations of outputs with inputs while ANNs capture higher order variations too. Despite satisfactory achievement in estimating overall muscle forces by the ANN, some inadequacies are noted including assigning force to antagonistic muscles with no activity in the optimization algorithm of the FE model or predicting slightly different forces in bilateral pair muscles in symmetric lifting activities. Using these user-friendly tools spine loads and trunk muscle forces during symmetric and asymmetric static lifts can be easily estimated. Copyright © 2013 Elsevier Ltd. All rights reserved.

Comparison of predictive equations and measured resting energy expenditure among obese youth attending a pediatric healthy weight clinic: one size does not fit all.

Science.gov (United States)

Henes, Sarah T; Cummings, Doyle M; Hickner, Robert C; Houmard, Joseph A; Kolasa, Kathryn M; Lazorick, Suzanne; Collier, David N

2013-10-01

The Academy of Nutrition and Dietetics recommends the use of indirect calorimetry for calculating caloric targets for weight loss in obese youth. Practitioners typically use predictive equations since indirect calorimetry is often not available. The objective of this study was to compare measured resting energy expenditure (MREE) with that estimated using published predictive equations in obese pediatric patients. Youth aged 7 to 18 years (n = 80) who were referred to a university-based healthy weight clinic and who were greater than the 95th percentile BMI for age and gender participated. MREE was measured via a portable indirect calorimeter. Predicted energy expenditure (pEE) was estimated using published equations including those commonly used in children. pEE was compared to the MREE for each subject. Absolute mean difference between MREE and pEE, mean percentage accuracy, and mean error were determined. Mean percentage accuracy of pEE compared with MREE varied widely, with the Harris-Benedict, Lazzer, and Molnar equations providing the greatest accuracy (65%, 61%, and 60%, respectively). Mean differences between MREE and equation-estimated caloric targets varied from 197.9 kcal/day to 307.7 kcal/day. The potential to either overestimate or underestimate calorie needs in the clinical setting is significant when comparing EE derived from predictive equations with that measured using portable indirect calorimetry. While our findings suggest that the Harris-Benedict equation has improved accuracy relative to other equations in severely obese youth, the potential for error remains sufficiently great to suggest that indirect calorimetry is preferred.
Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildfires, Southern California, 2003-2006

Science.gov (United States)

Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.

2008-01-01

Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Prediction of Thermal Properties of Sweet Sorghum Bagasse as a Function of Moisture Content Using Artificial Neural Networks and Regression Models

Directory of Open Access Journals (Sweden)

Gosukonda Ramana

2017-06-01

Full Text Available Artificial neural networks (ANN and traditional regression models were developed for prediction of thermal properties of sweet sorghum bagasse as a function of moisture content and room temperature. Predictions were made for three thermal properties: 1 thermal conductivity, 2 volumetric specific heat, and 3 thermal diffusivity. Each thermal property had five levels of moisture content (8.52%, 12.93%, 18.94%, 24.63%, and 28.62%, w. b. and room temperature as inputs. Data were sub-partitioned for training, testing, and validation of models. Backpropagation (BP and Kalman Filter (KF learning algorithms were employed to develop nonparametric models between input and output data sets. Statistical indices including correlation coefficient (R between actual and predicted outputs were produced for selecting the suitable models. Prediction plots for thermal properties indicated that the ANN models had better accuracy from unseen patterns as compared to regression models. In general, ANN models were able to strongly generalize and interpolate unseen patterns within the domain of training.
The state equation of aggregation behaviours for Poly(oxyethylene)-Poly(oxypropylene)-Poly(oxyethylene) tri-block copolymers in aqueous solution

Science.gov (United States)

Gao, Xuechao; Ji, Guozhao; Peng, Tiefeng

2018-03-01

In this work, the aggregation equation is developed to describe the aggregation number of copolymer molecules and micellar diameters from experimental data. Based on the regression parameters in the aggregation equation, it is concluded that the PO parts are beneficial to enlarge the micellar size and the EO parts suppress the formation of the micelles. By fitting the parameters with the EO and PO number, the aggregation equation was proposed to predict the aggregation behaviours of tri-block copolymers having EO units between 26 and 212, and with PO number between 30 and 70. By applying the equation to aqueous solution with salt additives, it can be extended to evaluate the impacts of the additives on the micelle formation.
Comparison of Prediction Model for Cardiovascular Autonomic Dysfunction Using Artificial Neural Network and Logistic Regression Analysis

Science.gov (United States)

Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo

2013-01-01

Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies

Science.gov (United States)

O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.

2011-01-01

The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
Comparison of ν-support vector regression and logistic equation for ...

African Journals Online (AJOL)

Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Predicting lake trophic state by relating Secchi-disk transparency measurements to Landsat-satellite imagery for Michigan inland lakes, 2003-05 and 2007-08

Science.gov (United States)

Fuller, L.M.; Jodoin, R.S.; Minnerick, R.J.

2011-01-01

Inland lakes are an important economic and environmental resource for Michigan. The U.S. Geological Survey and the Michigan Department of Natural Resources and Environment have been cooperatively monitoring the quality of selected lakes in Michigan through the Lake Water Quality Assessment program. Sampling for this program began in 2001; by 2010, 730 of Michigan’s 11,000 inland lakes are expected to have been sampled once. Volunteers coordinated by the Michigan Department of Natural Resources and Environment began sampling lakes in 1974 and continue to sample (in 2010) approximately 250 inland lakes each year through the Michigan Cooperative Lakes Monitoring Program. Despite these sampling efforts, it still is impossible to physically collect measurements for all Michigan inland lakes; however, Landsat-satellite imagery has been used successfully in Minnesota, Wisconsin, Michigan, and elsewhere to predict the trophic state of unsampled inland lakes greater than 20 acres by producing regression equations relating in-place Secchi-disk measurements to Landsat bands. This study tested three alternatives to methods previously used in Michigan to improve results for predicted statewide Trophic State Index (TSI) computed from Secchi-disk transparency (TSI (SDT)). The alternative methods were used on 14 Landsat-satellite scenes with statewide TSI (SDT) for two time periods (2003– 05 and 2007–08). Specifically, the methods were (1) satellitedata processing techniques to remove areas affected by clouds, cloud shadows, haze, shoreline, and dense vegetation for inland lakes greater than 20 acres in Michigan; (2) comparison of the previous method for producing a single open-water predicted TSI (SDT) value (which was based on an area of interest (AOI) and lake-average approach) to an alternative Gethist method for identifying open-water areas in inland lakes (which follows the initial satellite-data processing and targets the darkest pixels, representing the deepest water
The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea

Science.gov (United States)

Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee

2016-02-01

The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea

Directory of Open Access Journals (Sweden)

Saro Lee

2016-02-01

Full Text Available The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS. These factors were analysed using artificial neural network (ANN and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50% and a test set (50%. A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10% was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%. Of the weights used in the artificial neural network model, ‘slope’ yielded the highest weight value (1.330, and ‘aspect’ yielded the lowest value (1.000. This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models

Directory of Open Access Journals (Sweden)

Aeriel Belk

2018-02-01

Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.
Gaussian Process Regression Model in Spatial Logistic Regression

Science.gov (United States)

Sofro, A.; Oktaviarina, A.

2018-01-01

Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
riskRegression

DEFF Research Database (Denmark)

Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

2017-01-01

In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...
A Gaussian process regression based hybrid approach for short-term wind speed prediction

International Nuclear Information System (INIS)

Zhang, Chi; Wei, Haikun; Zhao, Xin; Liu, Tianhong; Zhang, Kanjian

2016-01-01

Highlights: • A novel hybrid approach is proposed for short-term wind speed prediction. • This method combines the parametric AR model with the non-parametric GPR model. • The relative importance of different inputs is considered. • Different types of covariance functions are considered and combined. • It can provide both accurate point forecasts and satisfactory prediction intervals. - Abstract: This paper proposes a hybrid model based on autoregressive (AR) model and Gaussian process regression (GPR) for probabilistic wind speed forecasting. In the proposed approach, the AR model is employed to capture the overall structure from wind speed series, and the GPR is adopted to extract the local structure. Additionally, automatic relevance determination (ARD) is used to take into account the relative importance of different inputs, and different types of covariance functions are combined to capture the characteristics of the data. The proposed hybrid model is compared with the persistence model, artificial neural network (ANN), and support vector machine (SVM) for one-step ahead forecasting, using wind speed data collected from three wind farms in China. The forecasting results indicate that the proposed method can not only improve point forecasts compared with other methods, but also generate satisfactory prediction intervals.
Examining Predictive Validity of Oral Reading Fluency Slope in Upper Elementary Grades Using Quantile Regression.

Science.gov (United States)

Cho, Eunsoo; Capin, Philip; Roberts, Greg; Vaughn, Sharon

2017-07-01

Within multitiered instructional delivery models, progress monitoring is a key mechanism for determining whether a child demonstrates an adequate response to instruction. One measure commonly used to monitor the reading progress of students is oral reading fluency (ORF). This study examined the extent to which ORF slope predicts reading comprehension outcomes for fifth-grade struggling readers ( n = 102) participating in an intensive reading intervention. Quantile regression models showed that ORF slope significantly predicted performance on a sentence-level fluency and comprehension assessment, regardless of the students' reading skills, controlling for initial ORF performance. However, ORF slope was differentially predictive of a passage-level comprehension assessment based on students' reading skills when controlling for initial ORF status. Results showed that ORF explained unique variance for struggling readers whose posttest performance was at the upper quantiles at the end of the reading intervention, but slope was not a significant predictor of passage-level comprehension for students whose reading problems were the most difficult to remediate.
Reliability of CKD-EPI predictive equation in estimating chronic kidney disease prevalence in the Croatian endemic nephropathy area.

Science.gov (United States)

Fuček, Mirjana; Dika, Živka; Karanović, Sandra; Vuković Brinar, Ivana; Premužić, Vedran; Kos, Jelena; Cvitković, Ante; Mišić, Maja; Samardžić, Josip; Rogić, Dunja; Jelaković, Bojan

2018-02-15

Chronic kidney disease (CKD) is a significant public health problem and it is not possible to precisely predict its progression to terminal renal failure. According to current guidelines, CKD stages are classified based on the estimated glomerular filtration rate (eGFR) and albuminuria. Aims of this study were to determine the reliability of predictive equation in estimation of CKD prevalence in Croatian areas with endemic nephropathy (EN), compare the results with non-endemic areas, and to determine if the prevalence of CKD stages 3-5 was increased in subjects with EN. A total of 1573 inhabitants of the Croatian Posavina rural area from 6 endemic and 3 non-endemic villages were enrolled. Participants were classified according to the modified criteria of the World Health Organization for EN. Estimated GFR was calculated using Chronic Kidney Disease Epidemiology Collaboration equation (CKD-EPI). The results showed a very high CKD prevalence in the Croatian rural area (19%). CKD prevalence was significantly higher in EN then in non EN villages with the lowest eGFR value in diseased subgroup. eGFR correlated significantly with the diagnosis of EN. Kidney function assessment using CKD-EPI predictive equation proved to be a good marker in differentiating the study subgroups, remained as one of the diagnostic criteria for EN.
Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models

Science.gov (United States)

Christiansen, Bo

2015-04-01

Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
Predictive Capability of the Compressible MRG Equation for an Explosively Driven Particle with Validation

Science.gov (United States)

Garno, Joshua; Ouellet, Frederick; Koneru, Rahul; Balachandar, Sivaramakrishnan; Rollin, Bertrand

2017-11-01

An analytic model to describe the hydrodynamic forces on an explosively driven particle is not currently available. The Maxey-Riley-Gatignol (MRG) particle force equation generalized for compressible flows is well-studied in shock-tube applications, and captures the evolution of particle force extracted from controlled shock-tube experiments. In these experiments only the shock-particle interaction was examined, and the effects of the contact line were not investigated. In the present work, the predictive capability of this model is considered for the case where a particle is explosively ejected from a rigid barrel into ambient air. Particle trajectory information extracted from simulations is compared with experimental data. This configuration ensures that both the shock and contact produced by the detonation will influence the motion of the particle. The simulations are carried out using a finite volume, Euler-Lagrange code using the JWL equation of state to handle the explosive products. This work was supported by the U.S. Department of Energy, National Nuclear Security Administration, Advanced Simulation and Computing Program, as a Cooperative Agreement under the Predictive Science Academic Alliance Program,under Contract No. DE-NA0002378.
Predictors of course in obsessive-compulsive disorder: logistic regression versus Cox regression for recurrent events.

Science.gov (United States)

Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M

2007-09-01

Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha

2014-12-08

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha; Huang, Jianhua Z.

2014-01-01

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Prediction of fat-free body mass from bioelectrical impedance among 9- to 11-year-old Swedish children

DEFF Research Database (Denmark)

Nielsen, Birgit Marie; Dencker, M; Ward, L

2007-01-01

AIM: Predictive equations for estimating body composition from bioelectrical impedance analysis (BIA) among Scandinavian children are lacking. In the present study, equations for estimation of fat-free body mass (FFM) and lean tissue mass (LTM) were developed and cross-validated from BIA using dual...... linear regression and cross-validated against DXA measurements of body composition. RESULTS: FFM was predicted from BIA and anthropometric variables with an adjusted R(2)= 0.95 and root mean square error (RMSE) = 0.84 kg, and LTM was predicted with an adjusted R(2)= 0.95 and RMSE = 0.87 kg. Cross......-validation revealed a mean RMSE = 0.95 kg FFM and a mean RMSE = 0.96 kg LTM. Prediction of body composition from equations developed in previous literature was mixed when applied to the present cohort of children. CONCLUSIONS: FFM and LTM are predicted with sufficient accuracy at the population level. We recommend...
Basal Metabolic Rate of Adolescent Modern Pentathlon Athletes: Agreement between Indirect Calorimetry and Predictive Equations and the Correlation with Body Parameters

Science.gov (United States)

Loureiro, Luiz Lannes; Fonseca, Sidnei; Castro, Natalia Gomes Casanova de Oliveira e; dos Passos, Renata Baratta; Porto, Cristiana Pedrosa Melo; Pierucci, Anna Paola Trindade Rocha

2015-01-01

Purpose The accurate estimative of energy needs is crucial for an optimal physical performance among athletes and the basal metabolic rate (BMR) equations often are not well adjusted for adolescent athletes requiring the use of specific methods, such as the golden standard indirect calorimetry (IC). Therefore, we had the aim to analyse the agreement between the BMR of adolescents pentathletes measured by IC and estimated by commonly used predictive equations. Methods Twenty-eight athletes (17 males and 11 females) were evaluated for BMR, using IC and the predictive equations Harris and Benedict (HB), Cunningham (CUN), Henry and Rees (HR) and FAO/WHO/UNU (FAO). Body composition was obtained using DXA and sexual maturity data were retrieved through validated questionnaires. The correlations among anthropometric variables an IC were analysed by T-student test and ICC, while the agreement between IC and the predictive equations was analysed according to Bland and Altman and by survival-agreement plotting. Results The whole sample average BMR measured by IC was significantly different from the estimated by FAO (pBMR when compared with IC (T Test). When compared to the golden standard IC, using Bland and Altman, ICC and Survival-Agreement, the equations underestimated the energy needs of adolescent pentathlon athletes up to 300kcal/day. Therefore, they should be used with caution when estimating individual energy requirements in such populations. PMID:26569101
Prediction of the enthalpies of vaporization for room-temperature ionic liquids: Correlations and a substitution-based additive scheme

International Nuclear Information System (INIS)

Kabo, Gennady J.; Paulechka, Yauheni U.; Zaitsau, Dzmitry H.; Firaha, Alena S.

2015-01-01

Highlights: • The available literature data on Δ l g H for ionic liquids were analyzed. • Correlation equations for Δ l g H were derived using symbolic regression. • A substitution-based incremental scheme for Δ l g H was developed. • The proposed scheme has an advantage over the existing predictive procedures. - Abstract: The literature data on the enthalpies of vaporization for aprotic ionic liquids (ILs) published by the end of May 2014 were analyzed and the most reliable Δ l g H m values were derived for 68 ILs. The selected enthalpies of vaporization were correlated with density and surface tension using symbolic regression and a number of effective correlation equations were proposed. The substitution-based incremental scheme for prediction of the enthalpies of vaporization of imidazolium, pyridinium and pyrrolidinium ILs was developed. The standard error of the regression for the developed scheme is significantly lower than that for the atom-based group-contribution schemes proposed earlier
Weight and height prediction of immobilized patients

OpenAIRE

Rabito,Estela Iraci; Vannucchi,Gabriela Bergamini; Suen,Vivian Marques Miguel; Castilho Neto,Laércio Lopes; Marchini,Júlio Sérgio

2006-01-01

OBJECTIVE: To confirm the adequacy of the formula suggested in the literature and/or to develop appropriate equations for the Brazilian population of immobilized patients based on simple anthropometric measurements. METHODS: Hospitalized patients were submitted to anthropometry and methods to estimate weight and height of bedridden patients were developed by multiple linear regression. RESULTS: Three hundred sixty eight persons were evaluated at two hospital centers and five weight-predicting...
Five-equation and robust three-equation methods for solution verification of large eddy simulation

Science.gov (United States)

Dutta, Rabijit; Xing, Tao

2018-02-01

This study evaluates the recently developed general framework for solution verification methods for large eddy simulation (LES) using implicitly filtered LES of periodic channel flows at friction Reynolds number of 395 on eight systematically refined grids. The seven-equation method shows that the coupling error based on Hypothesis I is much smaller as compared with the numerical and modeling errors and therefore can be neglected. The authors recommend five-equation method based on Hypothesis II, which shows a monotonic convergence behavior of the predicted numerical benchmark ( S C ), and provides realistic error estimates without the need of fixing the orders of accuracy for either numerical or modeling errors. Based on the results from seven-equation and five-equation methods, less expensive three and four-equation methods for practical LES applications were derived. It was found that the new three-equation method is robust as it can be applied to any convergence types and reasonably predict the error trends. It was also observed that the numerical and modeling errors usually have opposite signs, which suggests error cancellation play an essential role in LES. When Reynolds averaged Navier-Stokes (RANS) based error estimation method is applied, it shows significant error in the prediction of S C on coarse meshes. However, it predicts reasonable S C when the grids resolve at least 80% of the total turbulent kinetic energy.
Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients.

Science.gov (United States)

Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L

2012-08-07

Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in
Classification and regression tree (CART model to predict pulmonary tuberculosis in hospitalized patients

Directory of Open Access Journals (Sweden)

Aguiar Fabio S

2012-08-01

Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with
Application of the van der Waals equation of state to polymers .4. Correlation and prediction of lower critical solution temperatures for polymer solutions

DEFF Research Database (Denmark)

Goncalves, Ana Saraiva; Kontogeorgis, Georgios; Harismiadis, Vassilis I.

1996-01-01

The van der Waals equation of state is used for the correlation and the prediction of the lower critical solution behavior or mixtures including a solvent and a polymer. The equation of state parameters for the polymer are estimated from experimental volumetric data at low pressures. The equation...
Fault prediction for nonlinear stochastic system with incipient faults based on particle filter and nonlinear regression.

Science.gov (United States)

Ding, Bo; Fang, Huajing

2017-05-01

This paper is concerned with the fault prediction for the nonlinear stochastic system with incipient faults. Based on the particle filter and the reasonable assumption about the incipient faults, the modified fault estimation algorithm is proposed, and the system state is estimated simultaneously. According to the modified fault estimation, an intuitive fault detection strategy is introduced. Once each of the incipient fault is detected, the parameters of which are identified by a nonlinear regression method. Then, based on the estimated parameters, the future fault signal can be predicted. Finally, the effectiveness of the proposed method is verified by the simulations of the Three-tank system. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

Science.gov (United States)

Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

2017-06-01

The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
[From clinical judgment to linear regression model.

Science.gov (United States)

Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

2013-01-01

When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Computing confidence and prediction intervals of industrial equipment degradation by bootstrapped support vector regression

International Nuclear Information System (INIS)

Lins, Isis Didier; Droguett, Enrique López; Moura, Márcio das Chagas; Zio, Enrico; Jacinto, Carlos Magno

2015-01-01

Data-driven learning methods for predicting the evolution of the degradation processes affecting equipment are becoming increasingly attractive in reliability and prognostics applications. Among these, we consider here Support Vector Regression (SVR), which has provided promising results in various applications. Nevertheless, the predictions provided by SVR are point estimates whereas in order to take better informed decisions, an uncertainty assessment should be also carried out. For this, we apply bootstrap to SVR so as to obtain confidence and prediction intervals, without having to make any assumption about probability distributions and with good performance even when only a small data set is available. The bootstrapped SVR is first verified on Monte Carlo experiments and then is applied to a real case study concerning the prediction of degradation of a component from the offshore oil industry. The results obtained indicate that the bootstrapped SVR is a promising tool for providing reliable point and interval estimates, which can inform maintenance-related decisions on degrading components. - Highlights: • Bootstrap (pairs/residuals) and SVR are used as an uncertainty analysis framework. • Numerical experiments are performed to assess accuracy and coverage properties. • More bootstrap replications does not significantly improve performance. • Degradation of equipment of offshore oil wells is estimated by bootstrapped SVR. • Estimates about the scale growth rate can support maintenance-related decisions
mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction.

Science.gov (United States)

Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

2015-03-15

Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
A predictive group-contribution simplified PC-SAFT equation of state: Application to polymer systems

DEFF Research Database (Denmark)

Tihic, Amra; Kontogeorgis, Georgios; von Solms, Nicolas

2008-01-01

A group-contribution (GC) method is coupled with the molecular-based perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state (EoS) to predict its characteristic pure compound parameters. The estimation of group contributions for the parameters is based on a parameter...... are the molecular structure of the polymer of interest in terms of functional groups and a single binary interaction parameter for accurate mixture calculations....
GPS-VTEC near the magnetic equator during a high solar activity year: Observations and IRI predictions

International Nuclear Information System (INIS)

Ezquer, R.G.; Mosert, M.; Brunini, C.; Meza, A.; Cabrera, M.A.; Araoz, L.; Radicella, S.M.

2002-01-01

The validity of International Reference Ionosphere model to predict the vertical electron content (VTEC) over Arequipa (-16.5, 289.0; geoma. Lat.: - 5.1), station placed near the magnetic equator, is checked. VTEC measurements obtained with GPS satellite signals during year 2000 are considered. These data correspond to equinoxes and solstices. The results are similar to those obtained for the southern peak of the equatorial anomaly in previous work. In some cases, good VTEC predictions have been observed for hours of maximum ionisation. Overestimation for nighttime, sunrise and sunset hours were observed. The disagreements between predictions and measurements could arise because peak characteristics or the shape of the N profile, or both, are not well predicted. More studies including ionosonde measurements would be useful. (author)
Prediction of Fecal Nitrogen and Fecal Phosphorus Content for Lactating Dairy Cows in Large-scale Dairy Farms

Directory of Open Access Journals (Sweden)

QU Qing-bo

2017-05-01

Full Text Available To facilitate efficient and sustainable manure management and reduce potential pollution, it's necessary for precise prediction of fecal nutrient content. The aim of this study is to build prediction models of fecal nitrogen and phosphorus content by the factors of dietary nutrient composition, days in milk, milk yield and body weight of Chinese Holstein lactating dairy cows. 20 kinds of dietary nutrient composition and 60 feces samples were collected from lactating dairy cows from 7 large-scale dairy farms in Tianjin City; The fecal nitrogen and phosphorus content were analyzed. The whole data set was divided into training data set and testing data set. The training data set, including 14 kinds of dietary nutrient composition and 48 feces samples, was used to develop prediction models. The relationship between fecal nitrogen or phosphorus content and dietary nutrient composition was illustrated by means of correlation and regression analysis using SAS software. The results showed that fecal nitrogen(FN content was highly positively correlated with organic matter intake(OMI and crude fat intake(CFi, and correlation coefficients were 0. 836 and 0. 705, respectively. Negative correlation coefficient was found between fecal phosphorus(FP content and body weight(BW, and the correlation coefficient was -0.525. Among different approaches to develop prediction models, the results indicated that determination coefficients of multiple linear regression equations were higher than those of simple linear regression equations. Specially, fecal nitrogen content was excellently predicted by milk yield(MY, days in milk(DIM, organic matter intake(OMI and nitrogen intake(NI, and the model was as follows:y=0.43+0.29×MY+0.02×DIM+0.92×OMI-13.01×NI (R2=0.96. Accordingly, the highest determination coefficient of prediction equation of FP content was 0.62, when body weight(BW, phosphorus intake(PI and nitrogen intake(NI were combined as predictors. The prediction
Experimental analysis and regression prediction of desiccant wheel behavior in high temperature heat pump and desiccant wheel air-conditioning system

DEFF Research Database (Denmark)

Sheng, Ying; Zhang, Yufeng; Sun, Yuexia

2014-01-01

The objectives of this study are to evaluate the performance of desiccant wheel (DW) in the running system and obtain the useful data for practical application. The combined influences of multiple variables on the performance of desiccant wheel are investigated based on evaluating the indexes...... of moisture removal capacity, dehumidification effectiveness, dehumidification coefficient of performance and sensible energy ratio. The results show that higher effect on the dehumidification is due to the regeneration temperature and outdoor air humidity ratio rather than the outdoor air temperature...... and the ratio between regeneration and process air flow rates. A simple method based on multiple linear regression theory for predicting the performance of the wheel has been proposed. The predicted values and the experimental data are compared and good agreements are obtained. Regression models are established...
Predictive equations for lumbar spine loads in load-dependent asymmetric one- and two-handed lifting activities.

Science.gov (United States)

Arjmand, N; Plamondon, A; Shirazi-Adl, A; Parnianpour, M; Larivière, C

2012-07-01

Asymmetric lifting activities are associated with low back pain. A finite element biomechanical model is used to estimate spinal loads during one- and two-handed asymmetric static lifting activities. Model input variables are thorax flexion angle, load magnitude as well as load sagittal and lateral positions while response variables are L4-L5 and L5-S1 disc compression and shear forces. A number of levels are considered for each input variable and all their possible combinations are introduced into the model. Robust yet user-friendly predictive equations that relate model responses to its inputs are established. Predictive equations with adequate goodness-of-fit (R(2) ranged from ~94% to 99%, P≤0.001) that relate spinal loads to task (input) variables are established. Contour plots are used to identify combinations of task variable levels that yield spine loads beyond the recommended limits. The effect of uncertainties in the measurements of asymmetry-related inputs on spinal loads is studied. A number of issues regarding the NIOSH asymmetry multiplier are discussed and it is concluded that this multiplier should depend on the trunk posture and be defined in terms of the load vertical and horizontal positions. Due to an imprecise adjustment of the handled load magnitude this multiplier inadequately controls the biomechanical loading of the spine. Ergonomists and bioengineers, faced with the dilemma of using either complex but more accurate models on one hand or less accurate but simple models on the other hand, have hereby easy-to-use predictive equations that quantify spinal loads under various occupational tasks. Copyright © 2011 Elsevier Ltd. All rights reserved.
Regional differences in prediction models of lung function in Germany

Directory of Open Access Journals (Sweden)

Schäper Christoph

2010-04-01

Full Text Available Abstract Background Little is known about the influencing potential of specific characteristics on lung function in different populations. The aim of this analysis was to determine whether lung function determinants differ between subpopulations within Germany and whether prediction equations developed for one subpopulation are also adequate for another subpopulation. Methods Within three studies (KORA C, SHIP-I, ECRHS-I in different areas of Germany 4059 adults performed lung function tests. The available data consisted of forced expiratory volume in one second, forced vital capacity and peak expiratory flow rate. For each study multivariate regression models were developed to predict lung function and Bland-Altman plots were established to evaluate the agreement between predicted and measured values. Results The final regression equations for FEV1 and FVC showed adjusted r-square values between 0.65 and 0.75, and for PEF they were between 0.46 and 0.61. In all studies gender, age, height and pack-years were significant determinants, each with a similar effect size. Regarding other predictors there were some, although not statistically significant, differences between the studies. Bland-Altman plots indicated that the regression models for each individual study adequately predict medium (i.e. normal but not extremely high or low lung function values in the whole study population. Conclusions Simple models with gender, age and height explain a substantial part of lung function variance whereas further determinants add less than 5% to the total explained r-squared, at least for FEV1 and FVC. Thus, for different adult subpopulations of Germany one simple model for each lung function measures is still sufficient.

A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

Directory of Open Access Journals (Sweden)

Ruzzo Walter L

2006-03-01

Full Text Available Abstract Background As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. Results We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. Conclusion Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.
Performance Comparison Between Support Vector Regression and Artificial Neural Network for Prediction of Oil Palm Production

Directory of Open Access Journals (Sweden)

Mustakim Mustakim

2016-02-01

Full Text Available The largest region that produces oil palm in Indonesia has an important role in improving the welfare of society and economy. Oil palm has increased significantly in Riau Province in every period, to determine the production development for the next few years with the functions and benefits of oil palm carried prediction production results that were seen from time series data last 8 years (2005-2013. In its prediction implementation, it was done by comparing the performance of Support Vector Regression (SVR method and Artificial Neural Network (ANN. From the experiment, SVR produced the best model compared with ANN. It is indicated by the correlation coefficient of 95% and 6% for MSE in the kernel Radial Basis Function (RBF, whereas ANN produced only 74% for R2 and 9% for MSE on the 8th experiment with hiden neuron 20 and learning rate 0,1. SVR model generates predictions for next 3 years which increased between 3% - 6% from actual data and RBF model predictions.
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

Science.gov (United States)

Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui

2017-07-15

New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
Prediction accuracy and stability of regression with optimal scaling transformations

NARCIS (Netherlands)

Kooij, van der Anita J.

2007-01-01

The central topic of this thesis is the CATREG approach to nonlinear regression. This approach finds optimal quantifications for categorical variables and/or nonlinear transformations for numerical variables in regression analysis. (CATREG is implemented in SPSS Categories by the author of the
Predicting subject-driven actions and sensory experience in a virtual world with relevance vector machine regression of fMRI data.

Science.gov (United States)

Valente, Giancarlo; De Martino, Federico; Esposito, Fabrizio; Goebel, Rainer; Formisano, Elia

2011-05-15

In this work we illustrate the approach of the Maastricht Brain Imaging Center to the PBAIC 2007 competition, where participants had to predict, based on fMRI measurements of brain activity, subject driven actions and sensory experience in a virtual world. After standard pre-processing (slice scan time correction, motion correction), we generated rating predictions based on linear Relevance Vector Machine (RVM) learning from all brain voxels. Spatial and temporal filtering of the time series was optimized rating by rating. For some of the ratings (e.g. Instructions, Hits, Faces, Velocity), linear RVM regression was accurate and very consistent within and between subjects. For other ratings (e.g. Arousal, Valence) results were less satisfactory. Our approach ranked overall second. To investigate the role of different brain regions in ratings prediction we generated predictive maps, i.e. maps of the weighted contribution of each voxel to the predicted rating. These maps generally included (but were not limited to) "specialized" regions which are consistent with results from conventional neuroimaging studies and known functional neuroanatomy. In conclusion, Sparse Bayesian Learning models, such as RVM, appear to be a valuable approach to the multivariate regression of fMRI time series. The implementation of the Automatic Relevance Determination criterion is particularly suitable and provides a good generalization, despite the limited number of samples which is typically available in fMRI. Predictive maps allow disclosing multi-voxel patterns of brain activity that predict perceptual and behavioral subjective experience. Copyright © 2010 Elsevier Inc. All rights reserved.
Model Compaction Equation

African Journals Online (AJOL)

The currently proposed model compaction equation was derived from data sourced from the. Niger Delta and it relates porosity to depth for sandstones under hydrostatic pressure condition. The equation is useful in predicting porosity and compaction trend in hydrostatic sands of the. Niger Delta. GEOLOGICAL SETTING OF ...
Development and implementation of a regression model for predicting recreational water quality in the Cuyahoga River, Cuyahoga Valley National Park, Ohio 2009-11

Science.gov (United States)

Brady, Amie M.G.; Plona, Meg B.

2012-01-01

The Cuyahoga River within Cuyahoga Valley National Park (CVNP) is at times impaired for recreational use due to elevated concentrations of Escherichia coli (E. coli), a fecal-indicator bacterium. During the recreational seasons of mid-May through September during 2009–11, samples were collected 4 days per week and analyzed for E. coli concentrations at two sites within CVNP. Other water-quality and environ-mental data, including turbidity, rainfall, and streamflow, were measured and (or) tabulated for analysis. Regression models developed to predict recreational water quality in the river were implemented during the recreational seasons of 2009–11 for one site within CVNP–Jaite. For the 2009 and 2010 seasons, the regression models were better at predicting exceedances of Ohio's single-sample standard for primary-contact recreation compared to the traditional method of using the previous day's E. coli concentration. During 2009, the regression model was based on data collected during 2005 through 2008, excluding available 2004 data. The resulting model for 2009 did not perform as well as expected (based on the calibration data set) and tended to overestimate concentrations (correct responses at 69 percent). During 2010, the regression model was based on data collected during 2004 through 2009, including all of the available data. The 2010 model performed well, correctly predicting 89 percent of the samples above or below the single-sample standard, even though the predictions tended to be lower than actual sample concentrations. During 2011, the regression model was based on data collected during 2004 through 2010 and tended to overestimate concentrations. The 2011 model did not perform as well as the traditional method or as expected, based on the calibration dataset (correct responses at 56 percent). At a second site—Lock 29, approximately 5 river miles upstream from Jaite, a regression model based on data collected at the site during the recreational
A comparative approach to predicting effective dielectric, piezoelectric and elastic properties of PZT/PVDF composites

International Nuclear Information System (INIS)

Ahmad, Zeeshan; Prasad, Ashutosh; Prasad, K.

2009-01-01

The present study addresses the problem of quantitative prediction of effective relative permittivity, dielectric loss factor, piezoelectric charge coefficient, and Young's modulus of PZT/PVDF diphasic ceramic-polymer composite as a function of volume fraction of PZT in the different compositions. Theoretical results for effective relative permittivity derived from several dielectric mixture equations like those of Knott, Rother-Lichtenecker, Bruggeman, Maxwell-Wagner-Webmann-Skipetrov or Dias-Dasgupta, Furukawa, Lewin, Wiener, Jayasundere-Smith, Modified Cule-Torquato, Taylor, Poon-Shin and Rao et al. were fitted to the experimental data taken from previous works of Yamada et al. Similarly, the results for effective piezoelectric coefficient and Young's modulus, derived from different appropriate equations were fitted to the corresponding experimental data taken from the literature. The study revealed that only a few equations like modified Rother-Lichtenecker equation, Dias-Dasgupta equation and Rao equation for dielectric and piezoelectric properties while the four new equations developed in the present study of elastic property (Young's modulus) well fitted the corresponding experimental results. Further, the acceptable data put to various regression analyses showed that in most of the cases the third order polynomial regression analysis provided more acceptable fits.
The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

Science.gov (United States)

Bulcock, J. W.; And Others

Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
Calibration methods for the Hargreaves-Samani equation

Directory of Open Access Journals (Sweden)

Lucas Borges Ferreira

Full Text Available ABSTRACT The estimation of the reference evapotranspiration is an important factor for hydrological studies, design and management of irrigation systems, among others. The Penman Monteith equation presents high precision and accuracy in the estimation of this variable. However, its use becomes limited due to the large number of required meteorological data. In this context, the Hargreaves-Samani equation could be used as alternative, although, for a better performance a local calibration is required. Thus, the aim was to compare the calibration process of the Hargreaves-Samani equation by linear regression, by adjustment of the coefficients (A and B and exponent (C of the equation and by combinations of the two previous alternatives. Daily data from 6 weather stations, located in the state of Minas Gerais, from the period 1997 to 2016 were used. The calibration of the Hargreaves-Samani equation was performed in five ways: calibration by linear regression, adjustment of parameter “A”, adjustment of parameters “A” and “C”, adjustment of parameters “A”, “B” and “C” and adjustment of parameters “A”, “B” and “C” followed by calibration by linear regression. The performances of the models were evaluated based on the statistical indicators mean absolute error, mean bias error, Willmott’s index of agreement, correlation coefficient and performance index. All the studied methodologies promoted better estimations of reference evapotranspiration. The simultaneous adjustment of the empirical parameters “A”, “B” and “C” was the best alternative for calibration of the Hargreaves-Samani equation.
Predictive equations for respiratory muscle strength according to international and Brazilian guidelines

Directory of Open Access Journals (Sweden)

Isabela M. B. S. Pessoa

2014-10-01

Full Text Available Background: The maximum static respiratory pressures, namely the maximum inspiratory pressure (MIP and maximum expiratory pressure (MEP, reflect the strength of the respiratory muscles. These measures are simple, non-invasive, and have established diagnostic and prognostic value. This study is the first to examine the maximum respiratory pressures within the Brazilian population according to the recommendations proposed by the American Thoracic Society and European Respiratory Society (ATS/ERS and the Brazilian Thoracic Association (SBPT. Objective: To establish reference equations, mean values, and lower limits of normality for MIP and MEP for each age group and sex, as recommended by the ATS/ERS and SBPT. Method: We recruited 134 Brazilians living in Belo Horizonte, MG, Brazil, aged 20-89 years, with a normal pulmonary function test and a body mass index within the normal range. We used a digital manometer that operationalized the variable maximum average pressure (MIP/MEP. At least five tests were performed for both MIP and MEP to take into account a possible learning effect. Results: We evaluated 74 women and 60 men. The equations were as follows: MIP=63.27-0.55 (age+17.96 (gender+0.58 (weight, r2 of 34% and MEP= - 61.41+2.29 (age - 0.03(age2+33.72 (gender+1.40 (waist, r2 of 49%. Conclusion: In clinical practice, these equations could be used to calculate the predicted values of MIP and MEP for the Brazilian population.
BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER'S DISEASE.

Science.gov (United States)

Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

2015-12-01

The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

Directory of Open Access Journals (Sweden)

Yuehjen E. Shao

2013-01-01

Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.
Predicting in vivo glioma growth with the reaction diffusion equation constrained by quantitative magnetic resonance imaging data

International Nuclear Information System (INIS)

Hormuth II, David A; Weis, Jared A; Barnes, Stephanie L; Miga, Michael I; Yankeelov, Thomas E; Rericha, Erin C; Quaranta, Vito

2015-01-01

Reaction–diffusion models have been widely used to model glioma growth. However, it has not been shown how accurately this model can predict future tumor status using model parameters (i.e., tumor cell diffusion and proliferation) estimated from quantitative in vivo imaging data. To this end, we used in silico studies to develop the methods needed to accurately estimate tumor specific reaction–diffusion model parameters, and then tested the accuracy with which these parameters can predict future growth. The analogous study was then performed in a murine model of glioma growth. The parameter estimation approach was tested using an in silico tumor ‘grown’ for ten days as dictated by the reaction–diffusion equation. Parameters were estimated from early time points and used to predict subsequent growth. Prediction accuracy was assessed at global (total volume and Dice value) and local (concordance correlation coefficient, CCC) levels. Guided by the in silico study, rats (n = 9) with C6 gliomas, imaged with diffusion weighted magnetic resonance imaging, were used to evaluate the model’s accuracy for predicting in vivo tumor growth. The in silico study resulted in low global (tumor volume error 0.92) and local (CCC values >0.80) level errors for predictions up to six days into the future. The in vivo study showed higher global (tumor volume error >11.7%, Dice <0.81) and higher local (CCC <0.33) level errors over the same time period. The in silico study shows that model parameters can be accurately estimated and used to accurately predict future tumor growth at both the global and local scale. However, the poor predictive accuracy in the experimental study suggests the reaction–diffusion equation is an incomplete description of in vivo C6 glioma biology and may require further modeling of intra-tumor interactions including segmentation of (for example) proliferative and necrotic regions. (paper)
Abdominal girth and vertebral column length aid in predicting intrathecal hyperbaric bupivacaine dose for elective cesarean section.

Science.gov (United States)

Wei, Chang-Na; Zhou, Qing-He; Wang, Li-Zhong

2017-08-01

Currently, there is no consensus on how to determine the optimal dose of intrathecal bupivacaine for an individual undergoing an elective cesarean section. In this study, we developed a regression equation between intrathecal 0.5% hyperbaric bupivacaine volume and abdominal girth and vertebral column length, to determine a suitable block level (T5) for elective cesarean section patients.In phase I, we analyzed 374 parturients undergoing an elective cesarean section that received a suitable dose of intrathecal 0.5% hyperbaric bupivacaine after a combined spinal-epidural (CSE) was performed at the L3/4 interspace. Parturients with T5 blockade to pinprick were selected for establishing the regression equation between 0.5% hyperbaric bupivacaine volume and vertebral column length and abdominal girth. Six parturient and neonatal variables, intrathecal 0.5% hyperbaric bupivacaine volume, and spinal anesthesia spread were recorded. Bivariate line correlation analyses, multiple line regression analyses, and 2-tailed t tests or chi-square test were performed, as appropriate. In phase II, another 200 parturients with CSE for elective cesarean section were enrolled to verify the accuracy of the regression equation.In phase I, a total of 143 parturients were selected to establish the following regression equation: YT5 = 0.074X1 - 0.022X2 - 0.017 (YT5 = 0.5% hyperbaric bupivacaine volume for T5 block level; X1 = vertebral column length; and X2 = abdominal girth). In phase II, a total of 189 participants were enrolled in the study to verify the accuracy of the regression equation, and 155 parturients with T5 blockade were deemed eligible, which accounted for 82.01% of all participants.This study evaluated parturients with T5 blockade to pinprick after a CSE for elective cesarean section to establish a regression equation between parturient vertebral column length and abdominal girth and 0.5% hyperbaric intrathecal bupivacaine volume. This equation can accurately
PASSENGER FLOWS PREDICTION IN MAJOR TRANSPORTATION HUBS

Directory of Open Access Journals (Sweden)

O. O. Ozerova

2013-11-01

Full Text Available Purpose. An effective organization of passenger traffic, due to the reliability prediction of traffic flow in passenger transport hubs. In order to determine the parameters of prospective passenger transport areas it is essential to analyze the impact of various factors and determine the most influential ones. Methodology. The article presents the method of paired linear correlation for a more influential factors on passengers in intercity and commuter and possible use in predicting the linear regression equations. Passenger transport vessel areas and branches of industry are interconnected and are in the ratio of passengers and production. Findings. It is found that the coefficient of correlation is in complex dependence on the duration of the period of retrospective analysis. Evaluation of reliability correlation coefficients and coefficients of predictive models led to the conclusion that the population gives the most accurate prediction of passenger flows, providing account of changes in Ukraine during the period of transformation. Originality. Equations of dependence on the impact of macroeconomic indicators were obtained and the evaluation of the reliability results was received. Practical value. The results of analysis and calculations will make short-term forecasting of traffic flow.
Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

Science.gov (United States)

Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

2014-08-27

State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.
Using heart rate to predict energy expenditure in large domestic dogs.

Science.gov (United States)

Gerth, N; Ruoß, C; Dobenecker, B; Reese, S; Starck, J M

2016-06-01

The aim of this study was to establish heart rate as a measure of energy expenditure in large active kennel dogs (28 ± 3 kg bw). Therefore, the heart rate (HR)-oxygen consumption (V˙O2) relationship was analysed in Foxhound-Boxer-Ingelheim-Labrador cross-breds (FBI dogs) at rest and graded levels of exercise on a treadmill up to 60-65% of maximal aerobic capacity. To test for effects of training, HR and V˙O2 were measured in female dogs, before and after a training period, and after an adjacent training pause to test for reversibility of potential effects. Least squares regression was applied to describe the relationship between HR and V˙O2. The applied training had no statistically significant effect on the HR-V˙O2 regression. A general regression line from all data collected was prepared to establish a general predictive equation for energy expenditure from HR in FBI dogs. The regression equation established in this study enables fast estimation of energy requirement for running activity. The equation is valid for large dogs weighing around 30 kg that run at ground level up to 15 km/h with a heart rate maximum of 190 bpm irrespective of the training level. Journal of Animal Physiology and Animal Nutrition © 2015 Blackwell Verlag GmbH.
A new predictive indicator for development of pressure ulcers in bedridden patients based on common laboratory tests results.

Science.gov (United States)

Hatanaka, N; Yamamoto, Y; Ichihara, K; Mastuo, S; Nakamura, Y; Watanabe, M; Iwatani, Y

2008-04-01

Various scales have been devised to predict development of pressure ulcers on the basis of clinical and laboratory data, such as the Braden Scale (Braden score), which is used to monitor activity and skin conditions of bedridden patients. However, none of these scales facilitates clinically reliable prediction. To develop a clinical laboratory data-based predictive equation for the development of pressure ulcers. Subjects were 149 hospitalised patients with respiratory disorders who were monitored for the development of pressure ulcers over a 3-month period. The proportional hazards model (Cox regression) was used to analyse the results of 12 basic laboratory tests on the day of hospitalisation in comparison with Braden score. Pressure ulcers developed in 38 patients within the study period. A Cox regression model consisting solely of Braden scale items showed that none of these items contributed to significantly predicting pressure ulcers. Rather, a combination of haemoglobin (Hb), C-reactive protein (CRP), albumin (Alb), age, and gender produced the best model for prediction. Using the set of explanatory variables, we created a new indicator based on a multiple logistic regression equation. The new indicator showed high sensitivity (0.73) and specificity (0.70), and its diagnostic power was higher than that of Alb, Hb, CRP, or the Braden score alone. The new indicator may become a more useful clinical tool for predicting presser ulcers than Braden score. The new indicator warrants verification studies to facilitate its clinical implementation in the future.
Odontometric Data and New Regression Equations for Predicting the Size of Unerupted Permanent Canine and Premolars for Chennai Population

Directory of Open Access Journals (Sweden)

S V Soumya

2013-01-01

Conclusion: The observations obtained from our study would not only pave the way in predicting the mesiodistal width of unerupted canine and premolar in Chennai population but also give normative odontometric data which can be used for anthropological use and for diagnosis and treatment planning

Prediction of pork quality parameters by applying fractals and data mining on MRI

DEFF Research Database (Denmark)

Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés

2017-01-01

This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One...... Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear...... regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate...
Predicting hearing thresholds and occupational hearing loss with multiple-frequency auditory steady-state responses.

Science.gov (United States)

Hsu, Ruey-Fen; Ho, Chi-Kung; Lu, Sheng-Nan; Chen, Shun-Sheng

2010-10-01

An objective investigation is needed to verify the existence and severity of hearing impairments resulting from work-related, noise-induced hearing loss in arbitration of medicolegal aspects. We investigated the accuracy of multiple-frequency auditory steady-state responses (Mf-ASSRs) between subjects with sensorineural hearing loss (SNHL) with and without occupational noise exposure. Cross-sectional study. Tertiary referral medical centre. Pure-tone audiometry and Mf-ASSRs were recorded in 88 subjects (34 patients had occupational noise-induced hearing loss [NIHL], 36 patients had SNHL without noise exposure, and 18 volunteers were normal controls). Inter- and intragroup comparisons were made. A predicting equation was derived using multiple linear regression analysis. ASSRs and pure-tone thresholds (PTTs) showed a strong correlation for all subjects (r = .77 ≈ .94). The relationship is demonstrated by the equationThe differences between the ASSR and PTT were significantly higher for the NIHL group than for the subjects with non-noise-induced SNHL (p tool for objectively evaluating hearing thresholds. Predictive value may be lower in subjects with occupational hearing loss. Regardless of carrier frequencies, the severity of hearing loss affects the steady-state response. Moreover, the ASSR may assist in detecting noise-induced injury of the auditory pathway. A multiple linear regression equation to accurately predict thresholds was shown that takes into consideration all effect factors.
Watershed regressions for pesticides (warp) models for predicting atrazine concentrations in Corn Belt streams

Science.gov (United States)

Stone, Wesley W.; Gilliom, Robert J.

2012-01-01

Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
The development of a practical and uncomplicated predictive equation to determine liver volume from simple linear ultrasound measurements of the liver

International Nuclear Information System (INIS)

Childs, Jessie T.; Thoirs, Kerry A.; Esterman, Adrian J.

2016-01-01

This study sought to develop a practical and uncomplicated predictive equation that could accurately calculate liver volumes, using multiple simple linear ultrasound measurements combined with measurements of body size. Penalized (lasso) regression was used to develop a new model and compare it to the ultrasonic linear measurements currently used clinically. A Bland–Altman analysis showed that the large limits of agreement of the new model render it too inaccurate to be of clinical use for estimating liver volume per se, but it holds value in tracking disease progress or response to treatment over time in individuals, and is certainly substantially better as an indicator of overall liver size than the ultrasonic linear measurements currently being used clinically. - Highlights: • A new model to calculate liver volumes from simple linear ultrasound measurements. • This model was compared to the linear measurements currently used clinically. • The new model holds value in tracking disease progress or response to treatment. • This model is better as an indicator of overall liver size.
Vector regression introduced

Directory of Open Access Journals (Sweden)

Mok Tik

2014-06-01

Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Modeling animal movements using stochastic differential equations

Science.gov (United States)

Haiganoush K. Preisler; Alan A. Ager; Bruce K. Johnson; John G. Kie

2004-01-01

We describe the use of bivariate stochastic differential equations (SDE) for modeling movements of 216 radiocollared female Rocky Mountain elk at the Starkey Experimental Forest and Range in northeastern Oregon. Spatially and temporally explicit vector fields were estimated using approximating difference equations and nonparametric regression techniques. Estimated...
Influence of coronary artery disease prevalence on predictive values of coronary CT angiography: a meta-regression analysis

Energy Technology Data Exchange (ETDEWEB)

Schlattmann, Peter [University Hospital of Friedrich-Schiller University Jena, Department of Medical Statistics, Informatics and Documentation, Jena (Germany); Schuetz, Georg M. [Freie Universitaet Berlin, Charite, Medical School, Department of Radiology, Humboldt-Universitaet zu Berlin, Berlin (Germany); Dewey, Marc [Freie Universitaet Berlin, Charite, Medical School, Department of Radiology, Humboldt-Universitaet zu Berlin, Berlin (Germany); Charite, Institut fuer Radiologie, Berlin (Germany)

2011-09-15

To evaluate the impact of coronary artery disease (CAD) prevalence on the predictive values of coronary CT angiography. We performed a meta-regression based on a generalised linear mixed model using the binomial distribution and a logit link to analyse the influence of the prevalence of CAD in published studies on the per-patient negative and positive predictive values of CT in comparison to conventional coronary angiography as the reference standard. A prevalence range in which the negative predictive value was higher than 90%, while at the same time the positive predictive value was higher than 70% was considered appropriate. The summary negative and positive predictive values of coronary CT angiography were 93.7% (95% confidence interval [CI] 92.8-94.5%) and 87.5% (95% CI, 86.5-88.5%), respectively. With 95% confidence, negative and positive predictive values higher than 90% and 70% were available with CT for a CAD prevalence of 18-63%. CT systems with >16 detector rows met these requirements for the positive (P < 0.01) and negative (P < 0.05) predictive values in a significantly broader range than systems with {<=}16 detector rows. It is reasonable to perform coronary CT angiography as a rule-out test in patients with a low-to-intermediate likelihood of disease. (orig.)
Influence of coronary artery disease prevalence on predictive values of coronary CT angiography: a meta-regression analysis

International Nuclear Information System (INIS)

Schlattmann, Peter; Schuetz, Georg M.; Dewey, Marc

2011-01-01

To evaluate the impact of coronary artery disease (CAD) prevalence on the predictive values of coronary CT angiography. We performed a meta-regression based on a generalised linear mixed model using the binomial distribution and a logit link to analyse the influence of the prevalence of CAD in published studies on the per-patient negative and positive predictive values of CT in comparison to conventional coronary angiography as the reference standard. A prevalence range in which the negative predictive value was higher than 90%, while at the same time the positive predictive value was higher than 70% was considered appropriate. The summary negative and positive predictive values of coronary CT angiography were 93.7% (95% confidence interval [CI] 92.8-94.5%) and 87.5% (95% CI, 86.5-88.5%), respectively. With 95% confidence, negative and positive predictive values higher than 90% and 70% were available with CT for a CAD prevalence of 18-63%. CT systems with >16 detector rows met these requirements for the positive (P < 0.01) and negative (P < 0.05) predictive values in a significantly broader range than systems with ≤16 detector rows. It is reasonable to perform coronary CT angiography as a rule-out test in patients with a low-to-intermediate likelihood of disease. (orig.)
Acidity in DMSO from the embedded cluster integral equation quantum solvation model.

Science.gov (United States)

Heil, Jochen; Tomazic, Daniel; Egbers, Simon; Kast, Stefan M

2014-04-01

The embedded cluster reference interaction site model (EC-RISM) is applied to the prediction of acidity constants of organic molecules in dimethyl sulfoxide (DMSO) solution. EC-RISM is based on a self-consistent treatment of the solute's electronic structure and the solvent's structure by coupling quantum-chemical calculations with three-dimensional (3D) RISM integral equation theory. We compare available DMSO force fields with reference calculations obtained using the polarizable continuum model (PCM). The results are evaluated statistically using two different approaches to eliminating the proton contribution: a linear regression model and an analysis of pK(a) shifts for compound pairs. Suitable levels of theory for the integral equation methodology are benchmarked. The results are further analyzed and illustrated by visualizing solvent site distribution functions and comparing them with an aqueous environment.
Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: insights into spatial variability using high-resolution satellite data.

Science.gov (United States)

Alexeeff, Stacey E; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A

2015-01-01

Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1 km × 1 km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R(2) yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with >0.9 out-of-sample R(2) yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the SEs. Land use regression models performed better in chronic effect simulations. These results can help researchers when interpreting health effect estimates in these types of studies.
Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis

Directory of Open Access Journals (Sweden)

BUDIMAN

2012-01-01

Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.
Variâncias do ponto crítico de equações de regressão quadrática Variances of the critical point of a quadratic regression equation

Directory of Open Access Journals (Sweden)

Ceile Cristina Ferreira Nunes

2004-04-01

ítico calculada usando-se a expressão que leva em consideração a covariância entre e apresenta resultados mais satisfatórios e que não segue uma distribuição normal, pois apresenta uma distribuição de freqüência com assimetria positiva e formato leptocúrtico.The aim of this paper is determine variances for the analysis of the critical point of a second-degree regression equation in experimental situations with different variances through Monte Carlo simulation. In many theoretical or applied studies, one finds situations involving ratios of random variables and more frequently normal variables. Examples are provided by variables, which appear in economic dose research of nutrients in fertilization experiments, as well as in other problems in which there are interests in the random variable, estimator of the critic point in the regression . Data of five hundred thirty six trials in cotton yield were utilized to study the distribution of the critical point of a quadratic regression equation by adjusting a quadratic model. The parameters were evaluated using a least square method. From the estimations a MATLAB routine was implemented to simulate two sets with five thousands random errors with normal distribution and zero mean, relative to each of the theoretical variances: or = 0.1; 0.5; 1; 5; 10; 15; 20 and 50. The estimation of the variance of the critical point was obtained by three methods: (a usual formula for the variance; (b formula obtained by differentiation of the critical point estimator and (c formula for the computation of the variance of a quotient by taking into consideration the covariance between and . The results obtained for the statistic average for the regression between e , as well as its respective variances in terms of the several theoretical residual variances ( adopted show that those theoretical values are close to real ones. Moreover, there is a trend of increasing and with increase of the theoretical variance. It may
Logistic regression models for predicting physical and mental health-related quality of life in rheumatoid arthritis patients.

Science.gov (United States)

Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar

2008-01-01

The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
Prediction of Compressional Wave Velocity Using Regression and Neural Network Modeling and Estimation of Stress Orientation in Bokaro Coalfield, India

Science.gov (United States)

Paul, Suman; Ali, Muhammad; Chatterjee, Rima

2018-01-01

Velocity of compressional wave ( V P) of coal and non-coal lithology is predicted from five wells from the Bokaro coalfield (CF), India. Shear sonic travel time logs are not recorded for all wells under the study area. Shear wave velocity ( Vs) is available only for two wells: one from east and other from west Bokaro CF. The major lithologies of this CF are dominated by coal, shaly coal of Barakar formation. This paper focuses on the (a) relationship between Vp and Vs, (b) prediction of Vp using regression and neural network modeling and (c) estimation of maximum horizontal stress from image log. Coal characterizes with low acoustic impedance (AI) as compared to the overlying and underlying strata. The cross-plot between AI and Vp/ Vs is able to identify coal, shaly coal, shale and sandstone from wells in Bokaro CF. The relationship between Vp and Vs is obtained with excellent goodness of fit ( R 2) ranging from 0.90 to 0.93. Linear multiple regression and multi-layered feed-forward neural network (MLFN) models are developed for prediction Vp from two wells using four input log parameters: gamma ray, resistivity, bulk density and neutron porosity. Regression model predicted Vp shows poor fit (from R 2 = 0.28) to good fit ( R 2 = 0.79) with the observed velocity. MLFN model predicted Vp indicates satisfactory to good R2 values varying from 0.62 to 0.92 with the observed velocity. Maximum horizontal stress orientation from a well at west Bokaro CF is studied from Formation Micro-Imager (FMI) log. Breakouts and drilling-induced fractures (DIFs) are identified from the FMI log. Breakout length of 4.5 m is oriented towards N60°W whereas the orientation of DIFs for a cumulative length of 26.5 m is varying from N15°E to N35°E. The mean maximum horizontal stress in this CF is towards N28°E.
Logic regression and its extensions.

Science.gov (United States)

Schwender, Holger; Ruczinski, Ingo

2010-01-01

Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Bayesian binary regression model: an application to in-hospital death after AMI prediction

Directory of Open Access Journals (Sweden)

Aparecida D. P. Souza

2004-08-01

Full Text Available A Bayesian binary regression model is developed to predict death of patients after acute myocardial infarction (AMI. Markov Chain Monte Carlo (MCMC methods are used to make inference and to evaluate Bayesian binary regression models. A model building strategy based on Bayes factor is proposed and aspects of model validation are extensively discussed in the paper, including the posterior distribution for the c-index and the analysis of residuals. Risk assessment, based on variables easily available within minutes of the patients' arrival at the hospital, is very important to decide the course of the treatment. The identified model reveals itself strongly reliable and accurate, with a rate of correct classification of 88% and a concordance index of 83%.Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.
Advancing individual tree biomass prediction: assessment and alternatives to the component ratio method

Science.gov (United States)

Aaron Weiskittel; Jereme Frank; David Walker; Phil Radtke; David Macfarlane; James Westfall

2015-01-01

Prediction of forest biomass and carbon is becoming important issues in the United States. However, estimating forest biomass and carbon is difficult and relies on empirically-derived regression equations. Based on recent findings from a national gap analysis and comprehensive assessment of the USDA Forest Service Forest Inventory and Analysis (USFS-FIA) component...
An integrated unscented kalman filter and relevance vector regression approach for lithium-ion battery remaining useful life and short-term capacity prediction

International Nuclear Information System (INIS)

Zheng, Xiujuan; Fang, Huajing

2015-01-01

The gradual decreasing capacity of lithium-ion batteries can serve as a health indicator for tracking the degradation of lithium-ion batteries. It is important to predict the capacity of a lithium-ion battery for future cycles to assess its health condition and remaining useful life (RUL). In this paper, a novel method is developed using unscented Kalman filter (UKF) with relevance vector regression (RVR) and applied to RUL and short-term capacity prediction of batteries. A RVR model is employed as a nonlinear time-series prediction model to predict the UKF future residuals which otherwise remain zero during the prediction period. Taking the prediction step into account, the predictive value through the RVR method and the latest real residual value constitute the future evolution of the residuals with a time-varying weighting scheme. Next, the future residuals are utilized by UKF to recursively estimate the battery parameters for predicting RUL and short-term capacity. Finally, the performance of the proposed method is validated and compared to other predictors with the experimental data. According to the experimental and analysis results, the proposed approach has high reliability and prediction accuracy, which can be applied to battery monitoring and prognostics, as well as generalized to other prognostic applications. - Highlights: • An integrated method is proposed for RUL prediction as well as short-term capacity prediction. • Relevance vector regression model is employed as a nonlinear time-series prediction model. • Unscented Kalman filter is used to recursively update the states for battery model parameters during the prediction. • A time-varying weighting scheme is utilized to improve the accuracy of the RUL prediction. • The proposed method demonstrates high reliability and prediction accuracy.
Prediction Equations of Energy Expenditure in Chinese Youth Based on Step Frequency during Walking and Running

Science.gov (United States)

Sun, Bo; Liu, Yu; Li, Jing Xian; Li, Haipeng; Chen, Peijie

2013-01-01

Purpose: This study set out to examine the relationship between step frequency and velocity to develop a step frequency-based equation to predict Chinese youth's energy expenditure (EE) during walking and running. Method: A total of 173 boys and girls aged 11 to 18 years old participated in this study. The participants walked and ran on a…
Genomic-Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R

Directory of Open Access Journals (Sweden)

Paulino Pérez

2010-09-01

Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.