WorldWideScience

Sample records for regression models multiple

  1. Entrepreneurial intention modeling using hierarchical multiple regression

    Directory of Open Access Journals (Sweden)

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  2. A test for the parameters of multiple linear regression models ...

    African Journals Online (AJOL)

    A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...

  3. Multiple regression and beyond an introduction to multiple regression and structural equation modeling

    CERN Document Server

    Keith, Timothy Z

    2014-01-01

    Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.

  4. Multiple Response Regression for Gaussian Mixture Models with Known Labels.

    Science.gov (United States)

    Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng

    2012-12-01

    Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.

  5. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  6. An Additive-Multiplicative Cox-Aalen Regression Model

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  7. 231 Using Multiple Regression Analysis in Modelling the Role of ...

    African Journals Online (AJOL)

    User

    of Internal Revenue, Tourism Bureau and hotel records. The multiple regression .... additional guest facilities such as restaurant, a swimming pool or child care and social function ... and provide good quality service to the public. Conclusion.

  8. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Science.gov (United States)

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  9. Multiple Linear Regression Model for Estimating the Price of a ...

    African Journals Online (AJOL)

    Ghana Mining Journal ... In the modeling, the Ordinary Least Squares (OLS) normality assumption which could introduce errors in the statistical analyses was dealt with by log transformation of the data, ensuring the data is normally ... The resultant MLRM is: Ŷi MLRM = (X'X)-1X'Y(xi') where X is the sample data matrix.

  10. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  11. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  12. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  13. ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL

    Directory of Open Access Journals (Sweden)

    Constantin Anghelache

    2011-11-01

    Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.

  14. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  15. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  16. Use of multiple linear regression and logistic regression models to investigate changes in birthweight for term singleton infants in Scotland.

    Science.gov (United States)

    Bonellie, Sandra R

    2012-10-01

    To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother.   Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.

  17. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    Science.gov (United States)

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  18. Clinical trials: odds ratios and multiple regression models--why and how to assess them

    NARCIS (Netherlands)

    Sobh, Mohamad; Cleophas, Ton J.; Hadj-Chaib, Amel; Zwinderman, Aeilko H.

    2008-01-01

    Odds ratios (ORs), unlike chi2 tests, provide direct insight into the strength of the relationship between treatment modalities and treatment effects. Multiple regression models can reduce the data spread due to certain patient characteristics and thus improve the precision of the treatment

  19. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  20. Multiple regression models for energy use in air-conditioned office buildings in different climates

    International Nuclear Information System (INIS)

    Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.

    2010-01-01

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  1. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis

    Directory of Open Access Journals (Sweden)

    BUDIMAN

    2012-01-01

    Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.

  2. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  3. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    Science.gov (United States)

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  4. Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

    Directory of Open Access Journals (Sweden)

    Fatimah Khaleel Ibrahim

    2017-08-01

    Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.

  5. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  6. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

    Science.gov (United States)

    Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

    2014-10-01

    Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

  7. Reduction of interferences in graphite furnace atomic absorption spectrometry by multiple linear regression modelling

    Science.gov (United States)

    Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto

    2000-12-01

    The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.

  8. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    Science.gov (United States)

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Model selection with multiple regression on distance matrices leads to incorrect inferences.

    Directory of Open Access Journals (Sweden)

    Ryan P Franckowiak

    Full Text Available In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC, its small-sample correction (AICc, and the Bayesian information criterion (BIC to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.

  10. 10 km running performance predicted by a multiple linear regression model with allometrically adjusted variables.

    Science.gov (United States)

    Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O

    2016-06-01

    The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.

  11. Mechanical property changes during neonatal development and healing using a multiple regression model.

    Science.gov (United States)

    Ansorge, Heather L; Adams, Sheila; Jawad, Abbas F; Birk, David E; Soslowsky, Louis J

    2012-04-30

    During neonatal development, tendons undergo a well orchestrated process whereby extensive structural and compositional changes occur in synchrony to produce a normal tissue. Conversely, during the repair response to injury, structural and compositional changes occur, but a mechanically inferior tendon is produced. As a result, developmental processes have been postulated as a potential paradigm for elucidation of mechanistic insight required to develop treatment modalities to improve adult tissue healing. The objective of this study was to compare and contrast normal development with injury during early and late developmental healing. Using backwards multiple linear regressions, quantitative and objective information was obtained into the structure-function relationships in tendon. Specifically, proteoglycans were shown to be significant predictors of modulus during early developmental healing but not during late developmental healing or normal development. Multiple independent parameters predicted percent relaxation during normal development, however, only biglycan and fibril diameter parameters predicted percent relaxation during early developmental healing. Lastly, multiple differential predictors were observed between early development and early developmental healing; however, no differential predictors were observed between late development and late developmental healing. This study presents a model through which objective analysis of how compositional and structural parameters that affect the development of mechanical parameters can be quantitatively measured. In addition, information from this study can be used to develop new treatment and therapies through which improved adult tendon healing can be obtained. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Thermodynamic Analysis of Simple Gas Turbine Cycle with Multiple Regression Modelling and Optimization

    Directory of Open Access Journals (Sweden)

    Abdul Ghafoor Memon

    2014-03-01

    Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.

  13. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    Science.gov (United States)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  14. Stepwise multiple regression method of greenhouse gas emission modeling in the energy sector in Poland.

    Science.gov (United States)

    Kolasa-Wiecek, Alicja

    2015-04-01

    The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. Copyright © 2015. Published by Elsevier B.V.

  15. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    Science.gov (United States)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  16. Step-up multiple regression model to compute Chlorophyll a in the coastal waters off Cochin, southwest coast of India

    Digital Repository Service at National Institute of Oceanography (India)

    Balachandran, K.K.; Jayalakshmy, K.V.; Laluraj, C.M.; Nair, M.; Joseph, T.; Sheeba, P.

    The interaction effects of abiotic processes in the production of phytoplankton in a coastal marine region off Cochin are evaluated using multiple regression models. The study shows that chlorophyll production is not limited by nutrients...

  17. Performance Prediction Modelling for Flexible Pavement on Low Volume Roads Using Multiple Linear Regression Analysis

    Directory of Open Access Journals (Sweden)

    C. Makendran

    2015-01-01

    Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.

  18. Multiple regression equations modelling of groundwater of Ajmer-Pushkar railway line region, Rajasthan (India).

    Science.gov (United States)

    Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra

    2010-01-01

    In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.

  19. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  20. Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Ali Asghar Besalatpour

    2016-02-01

    by 20-m digital elevation model (DEM. The data set was divided into two subsets of training and testing. The training subset was randomly chosen from 70% of the total set of the data and the remaining samples (30% of the data were used as the testing set. The correlation coefficient (r, mean square error (MSE, and error percentage (ERROR% between the measured and the predicted GMD values were used to evaluate the performance of the models. Results and Discussion: The description statistics showed that there was little variability in the sample distributions of the variables used in this study to develop the GMD prediction models, indicating that their values were all normally distributed. The constructed SVM model had better performance in predicting GMD compared to the traditional multiple linear regression model. The obtained MSE and r values for the developed SVM model for soil aggregate stability prediction were 0.005 and 0.86, respectively. The obtained ERROR% value for soil aggregate stability prediction using the SVM model was 10.7% while it was 15.7% for the regression model. The scatter plot figures also showed that the SVM model was more accurate in GMD estimation than the MLR model, since the predicted GMD values were closer in agreement with the measured values for most of the samples. The worse performance of the MLR model might be due to the larger amount of data that is required for developing a sustainable regression model compared to intelligent systems. Furthermore, only the linear effects of the predictors on the dependent variable can be extracted by linear models while in many cases the effects may not be linear in nature. Meanwhile, the SVM model is suitable for modelling nonlinear relationships and its major advantage is that the method can be developed without knowing the exact form of the analytical function on which the model should be built. All these indicate that the SVM approach would be a better choice for predicting soil aggregate

  1. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    NARCIS (Netherlands)

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  2. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    Science.gov (United States)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  3. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Directory of Open Access Journals (Sweden)

    Željko V. Račić

    2010-12-01

    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  4. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    Science.gov (United States)

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  5. A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

    Science.gov (United States)

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

    2006-08-01

    A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.

  6. A methodology for the design of experiments in computational intelligence with multiple regression models.

    Science.gov (United States)

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  7. A methodology for the design of experiments in computational intelligence with multiple regression models

    Directory of Open Access Journals (Sweden)

    Carlos Fernandez-Lozano

    2016-12-01

    Full Text Available The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  8. Soft Sensor Modeling Based on Multiple Gaussian Process Regression and Fuzzy C-mean Clustering

    Directory of Open Access Journals (Sweden)

    Xianglin ZHU

    2014-06-01

    Full Text Available In order to overcome the difficulties of online measurement of some crucial biochemical variables in fermentation processes, a new soft sensor modeling method is presented based on the Gaussian process regression and fuzzy C-mean clustering. With the consideration that the typical fermentation process can be distributed into 4 phases including lag phase, exponential growth phase, stable phase and dead phase, the training samples are classified into 4 subcategories by using fuzzy C- mean clustering algorithm. For each sub-category, the samples are trained using the Gaussian process regression and the corresponding soft-sensing sub-model is established respectively. For a new sample, the membership between this sample and sub-models are computed based on the Euclidean distance, and then the prediction output of soft sensor is obtained using the weighting sum. Taking the Lysine fermentation as example, the simulation and experiment are carried out and the corresponding results show that the presented method achieves better fitting and generalization ability than radial basis function neutral network and single Gaussian process regression model.

  9. Latent Variable Regression 4-Level Hierarchical Model Using Multisite Multiple-Cohorts Longitudinal Data. CRESST Report 801

    Science.gov (United States)

    Choi, Kilchan

    2011-01-01

    This report explores a new latent variable regression 4-level hierarchical model for monitoring school performance over time using multisite multiple-cohorts longitudinal data. This kind of data set has a 4-level hierarchical structure: time-series observation nested within students who are nested within different cohorts of students. These…

  10. Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation

    Directory of Open Access Journals (Sweden)

    Chunqing Li

    2012-01-01

    Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.

  11. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  12. Assessment of the expected construction company’s net profit using neural network and multiple regression models

    Directory of Open Access Journals (Sweden)

    H.H. Mohamad

    2013-09-01

    This research aims to develop a mathematical model for assessing the expected net profit of any construction company. To achieve the research objective, four steps were performed. First, the main factors affecting firms’ net profit were identified. Second, pertinent data regarding the net profit factors were collected. Third, two different net profit models were developed using the Multiple Regression (MR and the Neural Network (NN techniques. The validity of the proposed models was also investigated. Finally, the results of both MR and NN models were compared to investigate the predictive capabilities of the two models.

  13. Multiple logistic regression model of signalling practices of drivers on urban highways

    Science.gov (United States)

    Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana

    2015-05-01

    Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.

  14. QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression

    Directory of Open Access Journals (Sweden)

    Rachid Darnag

    2017-02-01

    Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.

  15. Using synthetic data to evaluate multiple regression and principal component analyses for statistical modeling of daily building energy consumption

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, T.A. (Energy Systems Lab., Texas A and M Univ., College Station, TX (United States)); Claridge, D.E. (Energy Systems Lab., Texas A and M Univ., College Station, TX (United States))

    1994-01-01

    Multiple regression modeling of monitored building energy use data is often faulted as a reliable means of predicting energy use on the grounds that multicollinearity between the regressor variables can lead both to improper interpretation of the relative importance of the various physical regressor parameters and to a model with unstable regressor coefficients. Principal component analysis (PCA) has the potential to overcome such drawbacks. While a few case studies have already attempted to apply this technique to building energy data, the objectives of this study were to make a broader evaluation of PCA and multiple regression analysis (MRA) and to establish guidelines under which one approach is preferable to the other. Four geographic locations in the US with different climatic conditions were selected and synthetic data sequence representative of daily energy use in large institutional buildings were generated in each location using a linear model with outdoor temperature, outdoor specific humidity and solar radiation as the three regression variables. MRA and PCA approaches were then applied to these data sets and their relative performances were compared. Conditions under which PCA seems to perform better than MRA were identified and preliminary recommendations on the use of either modeling approach formulated. (orig.)

  16. Identification of Determinants of Sports Skill Level in Badminton Players Using the Multiple Regression Model

    Directory of Open Access Journals (Sweden)

    Jaworski Janusz

    2016-03-01

    Full Text Available Purpose. The aim of the study was to evaluate somatic and functional determinants of sports skill level in badminton players at three consecutive stages of training. Methods. The study examined 96 badminton players aged 11 to 19 years. The scope of the study included somatic characteristics, physical abilities and neurosensory abilities. Thirty nine variables were analysed in each athlete. Coefficients of multiple determination were used to evaluate the effect of structural and functional parameters on sports skill level in badminton players. Results. In the group of younger cadets, quality and effectiveness of playing were mostly determined by the level of physical abilities. In the group of cadets, the most important determinants were physical abilities, followed by somatic characteristics. In this group, coordination abilities were also important. In juniors, the most pronounced was a set of the variables that reflect physical abilities. Conclusions. Models of determination of sports skill level are most noticeable in the group of cadets. In all three groups of badminton players, the dominant effect on the quality of playing is due to a set of the variables that determine physical abilities.

  17. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    Science.gov (United States)

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  18. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  19. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system

    International Nuclear Information System (INIS)

    Fang, Tingting; Lahdelma, Risto

    2016-01-01

    Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption

  20. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    Science.gov (United States)

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  1. Modeling the energy content of combustible ship-scrapping waste at Alang-Sosiya, India, using multiple regression analysis.

    Science.gov (United States)

    Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K

    2005-01-01

    Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste.

  2. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  3. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Directory of Open Access Journals (Sweden)

    Ani Shabri

    2014-01-01

    Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  4. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  5. Predictive modelling of chromium removal using multiple linear and nonlinear regression with special emphasis on operating parameters of bioelectrochemical reactor.

    Science.gov (United States)

    More, Anand Govind; Gupta, Sunil Kumar

    2018-03-24

    Bioelectrochemical system (BES) is a novel, self-sustaining metal removal technology functioning on the utilization of chemical energy of organic matter with the help of microorganisms. Experimental trials of two chambered BES reactor were conducted with varying substrate concentration using sodium acetate (500 mg/L to 2000 mg/L COD) and different initial chromium concentration (Cr i ) (10-100 mg/L) at different cathode pH (pH 1-7). In the current study mathematical models based on multiple linear regression (MLR) and non-linear regression (NLR) approach were developed using laboratory experimental data for determining chromium removal efficiency (CRE) in the cathode chamber of BES. Substrate concentration, rate of substrate consumption, Cr i , pH, temperature and hydraulic retention time (HRT) were the operating process parameters of the reactor considered for development of the proposed models. MLR showed a better correlation coefficient (0.972) as compared to NLR (0.952). Validation of the models using t-test analysis revealed unbiasedness of both the models, with t critical value (2.04) greater than t-calculated values for MLR (-0.708) and NLR (-0.86). The root-mean-square error (RMSE) for MLR and NLR were 5.06 % and 7.45 %, respectively. Comparison between both models suggested MLR to be best suited model for predicting the chromium removal behavior using the BES technology to specify a set of operating conditions for BES. Modelling the behavior of CRE will be helpful for scale up of BES technology at industrial level. Copyright © 2018 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  6. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    Science.gov (United States)

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  7. Assessment of triglyceride and cholesterol in overweight people based on multiple linear regression and artificial intelligence model.

    Science.gov (United States)

    Ma, Jing; Yu, Jiong; Hao, Guangshu; Wang, Dan; Sun, Yanni; Lu, Jianxin; Cao, Hongcui; Lin, Feiyan

    2017-02-20

    The prevalence of high hyperlipemia is increasing around the world. Our aims are to analyze the relationship of triglyceride (TG) and cholesterol (TC) with indexes of liver function and kidney function, and to develop a prediction model of TG, TC in overweight people. A total of 302 adult healthy subjects and 273 overweight subjects were enrolled in this study. The levels of fasting indexes of TG (fs-TG), TC (fs-TC), blood glucose, liver function, and kidney function were measured and analyzed by correlation analysis and multiple linear regression (MRL). The back propagation artificial neural network (BP-ANN) was applied to develop prediction models of fs-TG and fs-TC. The results showed there was significant difference in biochemical indexes between healthy people and overweight people. The correlation analysis showed fs-TG was related to weight, height, blood glucose, and indexes of liver and kidney function; while fs-TC was correlated with age, indexes of liver function (P < 0.01). The MRL analysis indicated regression equations of fs-TG and fs-TC both had statistic significant (P < 0.01) when included independent indexes. The BP-ANN model of fs-TG reached training goal at 59 epoch, while fs-TC model achieved high prediction accuracy after training 1000 epoch. In conclusions, there was high relationship of fs-TG and fs-TC with weight, height, age, blood glucose, indexes of liver function and kidney function. Based on related variables, the indexes of fs-TG and fs-TC can be predicted by BP-ANN models in overweight people.

  8. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  9. Multiple linear regression and artificial neural networks for delta-endotoxin and protease yields modelling of Bacillus thuringiensis.

    Science.gov (United States)

    Ennouri, Karim; Ben Ayed, Rayda; Triki, Mohamed Ali; Ottaviani, Ennio; Mazzarello, Maura; Hertelli, Fathi; Zouari, Nabil

    2017-07-01

    The aim of the present work was to develop a model that supplies accurate predictions of the yields of delta-endotoxins and proteases produced by B. thuringiensis var. kurstaki HD-1. Using available medium ingredients as variables, a mathematical method, based on Plackett-Burman design (PB), was employed to analyze and compare data generated by the Bootstrap method and processed by multiple linear regressions (MLR) and artificial neural networks (ANN) including multilayer perceptron (MLP) and radial basis function (RBF) models. The predictive ability of these models was evaluated by comparison of output data through the determination of coefficient (R 2 ) and mean square error (MSE) values. The results demonstrate that the prediction of the yields of delta-endotoxin and protease was more accurate by ANN technique (87 and 89% for delta-endotoxin and protease determination coefficients, respectively) when compared with MLR method (73.1 and 77.2% for delta-endotoxin and protease determination coefficients, respectively), suggesting that the proposed ANNs, especially MLP, is a suitable new approach for determining yields of bacterial products that allow us to make more appropriate predictions in a shorter time and with less engineering effort.

  10. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines.

    Science.gov (United States)

    DeForest, David K; Brix, Kevin V; Tear, Lucinda M; Adams, William J

    2018-01-01

    The bioavailability of aluminum (Al) to freshwater aquatic organisms varies as a function of several water chemistry parameters, including pH, dissolved organic carbon (DOC), and water hardness. We evaluated the ability of multiple linear regression (MLR) models to predict chronic Al toxicity to a green alga (Pseudokirchneriella subcapitata), a cladoceran (Ceriodaphnia dubia), and a fish (Pimephales promelas) as a function of varying DOC, pH, and hardness conditions. The MLR models predicted toxicity values that were within a factor of 2 of observed values in 100% of the cases for P. subcapitata (10 and 20% effective concentrations [EC10s and EC20s]), 91% of the cases for C. dubia (EC10s and EC20s), and 95% (EC10s) and 91% (EC20s) of the cases for P. promelas. The MLR models were then applied to all species with Al toxicity data to derive species and genus sensitivity distributions that could be adjusted as a function of varying DOC, pH, and hardness conditions (the P. subcapitata model was applied to algae and macrophytes, the C. dubia model was applied to invertebrates, and the P. promelas model was applied to fish). Hazardous concentrations to 5% of the species or genera were then derived in 2 ways: 1) fitting a log-normal distribution to species-mean EC10s for all species (following the European Union methodology), and 2) fitting a triangular distribution to genus-mean EC20s for animals only (following the US Environmental Protection Agency methodology). Overall, MLR-based models provide a viable approach for deriving Al water quality guidelines that vary as a function of DOC, pH, and hardness conditions and are a significant improvement over bioavailability corrections based on single parameters. Environ Toxicol Chem 2018;37:80-90. © 2017 SETAC. © 2017 SETAC.

  11. RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

    Science.gov (United States)

    This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)

  12. The mechanical properties of high speed GTAW weld and factors of nonlinear multiple regression model under external transverse magnetic field

    Science.gov (United States)

    Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou

    2013-05-01

    A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.

  13. Land use regression modeling of intra-urban residential variability in multiple traffic-related air pollutants

    Directory of Open Access Journals (Sweden)

    Baxter Lisa K

    2008-05-01

    Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns

  14. Association between resting-state brain network topological organization and creative ability: Evidence from a multiple linear regression model.

    Science.gov (United States)

    Jiao, Bingqing; Zhang, Delong; Liang, Aiying; Liang, Bishan; Wang, Zengjian; Li, Junchao; Cai, Yuxuan; Gao, Mengxia; Gao, Zhenni; Chang, Song; Huang, Ruiwang; Liu, Ming

    2017-10-01

    Previous studies have indicated a tight linkage between resting-state functional connectivity of the human brain and creative ability. This study aimed to further investigate the association between the topological organization of resting-state brain networks and creativity. Therefore, we acquired resting-state fMRI data from 22 high-creativity participants and 22 low-creativity participants (as determined by their Torrance Tests of Creative Thinking scores). We then constructed functional brain networks for each participant and assessed group differences in network topological properties before exploring the relationships between respective network topological properties and creative ability. We identified an optimized organization of intrinsic brain networks in both groups. However, compared with low-creativity participants, high-creativity participants exhibited increased global efficiency and substantially decreased path length, suggesting increased efficiency of information transmission across brain networks in creative individuals. Using a multiple linear regression model, we further demonstrated that regional functional integration properties (i.e., the betweenness centrality and global efficiency) of brain networks, particularly the default mode network (DMN) and sensorimotor network (SMN), significantly predicted the individual differences in creative ability. Furthermore, the associations between network regional properties and creative performance were creativity-level dependent, where the difference in the resource control component may be important in explaining individual difference in creative performance. These findings provide novel insights into the neural substrate of creativity and may facilitate objective identification of creative ability. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    Science.gov (United States)

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  16. Computing multiple-output regression quantile regions

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Roč. 56, č. 4 (2012), s. 840-853 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf

  17. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    Science.gov (United States)

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  18. Suppression Situations in Multiple Linear Regression

    Science.gov (United States)

    Shieh, Gwowen

    2006-01-01

    This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…

  19. The M Word: Multicollinearity in Multiple Regression.

    Science.gov (United States)

    Morrow-Howell, Nancy

    1994-01-01

    Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…

  20. Multiple Linear Regression: A Realistic Reflector.

    Science.gov (United States)

    Nutt, A. T.; Batsell, R. R.

    Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

  1. A multiple regression method for genomewide association studies ...

    Indian Academy of Sciences (India)

    Bujun Mei

    2018-06-07

    Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.

  2. On directional multiple-output quantile regression

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2011-01-01

    Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf

  3. Evaluation of heat transfer mathematical models and multiple linear regression to predict the inside variables in semi-solar greenhouse

    Directory of Open Access Journals (Sweden)

    M Taki

    2017-05-01

    Full Text Available Introduction Controlling greenhouse microclimate not only influences the growth of plants, but also is critical in the spread of diseases inside the greenhouse. The microclimate parameters were inside air, greenhouse roof and soil temperature, relative humidity and solar radiation intensity. Predicting the microclimate conditions inside a greenhouse and enabling the use of automatic control systems are the two main objectives of greenhouse climate model. The microclimate inside a greenhouse can be predicted by conducting experiments or by using simulation. Static and dynamic models are used for this purpose as a function of the metrological conditions and the parameters of the greenhouse components. Some works were done in past to 2015 year to simulation and predict the inside variables in different greenhouse structures. Usually simulation has a lot of problems to predict the inside climate of greenhouse and the error of simulation is higher in literature. The main objective of this paper is comparison between heat transfer and regression models to evaluate them to predict inside air and roof temperature in a semi-solar greenhouse in Tabriz University. Materials and Methods In this study, a semi-solar greenhouse was designed and constructed at the North-West of Iran in Azerbaijan Province (geographical location of 38°10′ N and 46°18′ E with elevation of 1364 m above the sea level. In this research, shape and orientation of the greenhouse, selected between some greenhouses common shapes and according to receive maximum solar radiation whole the year. Also internal thermal screen and cement north wall was used to store and prevent of heat lost during the cold period of year. So we called this structure, ‘semi-solar’ greenhouse. It was covered with glass (4 mm thickness. It occupies a surface of approximately 15.36 m2 and 26.4 m3. The orientation of this greenhouse was East–West and perpendicular to the direction of the wind prevailing

  4. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  5. Development of a predictive model for distribution coefficient (Kd) of 13'7Cs and 60Co in marine sediments using multiple linear regression analysis

    International Nuclear Information System (INIS)

    Kumar, Ajay; Ravi, P.M.; Guneshwar, S.L.; Rout, Sabyasachi; Mishra, Manish K.; Pulhani, Vandana; Tripathi, R.M.

    2018-01-01

    Numerous common methods (batch laboratory, the column laboratory, field-batch method, field modeling and K 0c method) are used frequently for determination of K d values. Recently, multiple regression models are considered as new best estimates for predicting the K d of radionuclides in the environment. It is also well known fact that the K d value is highly influenced by physico-chemical properties of sediment. Due to the significant variability in influencing parameters, the measured K d values can range over several orders of magnitude under different environmental conditions. The aim of this study is to develop a predictive model for K d values of 137 Cs and 60 Co based on the sediment properties using multiple linear regression analysis

  6. (Non) linear regression modelling

    NARCIS (Netherlands)

    Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.

    2012-01-01

    We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or

  7. Research and analyze of physical health using multiple regression analysis

    Directory of Open Access Journals (Sweden)

    T. S. Kyi

    2014-01-01

    Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.

  8. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Directory of Open Access Journals (Sweden)

    Shelley M. ALEXANDER

    2009-02-01

    Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].

  9. Regression Models for Market-Shares

    DEFF Research Database (Denmark)

    Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue

    2005-01-01

    On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....

  10. Panel Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    González, Andrés; Terasvirta, Timo; Dijk, Dick van

    We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...

  11. Application of single-step genomic best linear unbiased prediction with a multiple-lactation random regression test-day model for Japanese Holsteins.

    Science.gov (United States)

    Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi

    2017-08-01

    This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R 2 ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R 2 was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R 2 were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.

  12. Using multiple linear regression techniques to quantify carbon ...

    African Journals Online (AJOL)

    Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...

  13. Multiple Linear Regression Modeling To Predict the Stability of Polymer-Drug Solid Dispersions: Comparison of the Effects of Polymers and Manufacturing Methods on Solid Dispersion Stability.

    Science.gov (United States)

    Fridgeirsdottir, Gudrun A; Harris, Robert J; Dryden, Ian L; Fischer, Peter M; Roberts, Clive J

    2018-03-29

    Solid dispersions can be a successful way to enhance the bioavailability of poorly soluble drugs. Here 60 solid dispersion formulations were produced using ten chemically diverse, neutral, poorly soluble drugs, three commonly used polymers, and two manufacturing techniques, spray-drying and melt extrusion. Each formulation underwent a six-month stability study at accelerated conditions, 40 °C and 75% relative humidity (RH). Significant differences in times to crystallization (onset of crystallization) were observed between both the different polymers and the two processing methods. Stability from zero days to over one year was observed. The extensive experimental data set obtained from this stability study was used to build multiple linear regression models to correlate physicochemical properties of the active pharmaceutical ingredients (API) with the stability data. The purpose of these models is to indicate which combination of processing method and polymer carrier is most likely to give a stable solid dispersion. Six quantitative mathematical multiple linear regression-based models were produced based on selection of the most influential independent physical and chemical parameters from a set of 33 possible factors, one model for each combination of polymer and processing method, with good predictability of stability. Three general rules are proposed from these models for the formulation development of suitably stable solid dispersions. Namely, increased stability is correlated with increased glass transition temperature ( T g ) of solid dispersions, as well as decreased number of H-bond donors and increased molecular flexibility (such as rotatable bonds and ring count) of the drug molecule.

  14. Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

    Science.gov (United States)

    Bry, X; Verron, T; Cazes, P

    2009-05-29

    In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.

  15. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  16. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    Directory of Open Access Journals (Sweden)

    T. Soares dos Santos

    2016-01-01

    model output and observed monthly precipitation. We used general circulation model (GCM experiments for the 20th century (RCP historical; 1970–1999 and two scenarios (RCP 2.6 and 8.5; 2070–2100. The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  17. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  18. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  19. Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany.

    Science.gov (United States)

    Herrig, Ilona M; Böer, Simone I; Brennholt, Nicole; Manz, Werner

    2015-11-15

    Since rivers are typically subject to rapid changes in microbiological water quality, tools are needed to allow timely water quality assessment. A promising approach is the application of predictive models. In our study, we developed multiple linear regression (MLR) models in order to predict the abundance of the fecal indicator organisms Escherichia coli (EC), intestinal enterococci (IE) and somatic coliphages (SC) in the Lahn River, Germany. The models were developed on the basis of an extensive set of environmental parameters collected during a 12-months monitoring period. Two models were developed for each type of indicator: 1) an extended model including the maximum number of variables significantly explaining variations in indicator abundance and 2) a simplified model reduced to the three most influential explanatory variables, thus obtaining a model which is less resource-intensive with regard to required data. Both approaches have the ability to model multiple sites within one river stretch. The three most important predictive variables in the optimized models for the bacterial indicators were NH4-N, turbidity and global solar irradiance, whereas chlorophyll a content, discharge and NH4-N were reliable model variables for somatic coliphages. Depending on indicator type, the extended mode models also included the additional variables rainfall, O2 content, pH and chlorophyll a. The extended mode models could explain 69% (EC), 74% (IE) and 72% (SC) of the observed variance in fecal indicator concentrations. The optimized models explained the observed variance in fecal indicator concentrations to 65% (EC), 70% (IE) and 68% (SC). Site-specific efficiencies ranged up to 82% (EC) and 81% (IE, SC). Our results suggest that MLR models are a promising tool for a timely water quality assessment in the Lahn area. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. A modified parallel constitutive model for elevated temperature flow behavior of Ti-6Al-4V alloy based on multiple regression

    Energy Technology Data Exchange (ETDEWEB)

    Cai, Jun; Shi, Jiamin; Wang, Kuaishe; Wang, Wen; Wang, Qingjuan; Liu, Yingying [Xi' an Univ. of Architecture and Technology, Xi' an (China). School of Metallurgical Engineering; Li, Fuguo [Northwestern Polytechnical Univ., Xi' an (China). School of Materials Science and Engineering

    2017-07-15

    Constitutive analysis for hot working of Ti-6Al-4V alloy was carried out by using experimental stress-strain data from isothermal hot compression tests. A new kind of constitutive equation called a modified parallel constitutive model was proposed by considering the independent effects of strain, strain rate and temperature. The predicted flow stress data were compared with the experimental data. Statistical analysis was introduced to verify the validity of the developed constitutive equation. Subsequently, the accuracy of the proposed constitutive equations was evaluated by comparing with other constitutive models. The results showed that the developed modified parallel constitutive model based on multiple regression could predict flow stress of Ti-6Al-4V alloy with good correlation and generalization.

  1. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  2. Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood.

    Science.gov (United States)

    Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar

    2016-01-01

    Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  4. Use of Multiple Linear Regression Models for Setting Water Quality Criteria for Copper: A Complementary Approach to the Biotic Ligand Model.

    Science.gov (United States)

    Brix, Kevin V; DeForest, David K; Tear, Lucinda; Grosell, Martin; Adams, William J

    2017-05-02

    Biotic Ligand Models (BLMs) for metals are widely applied in ecological risk assessments and in the development of regulatory water quality guidelines in Europe, and in 2007 the United States Environmental Protection Agency (USEPA) recommended BLM-based water quality criteria (WQC) for Cu in freshwater. However, to-date, few states have adopted BLM-based Cu criteria into their water quality standards on a state-wide basis, which appears to be due to the perception that the BLM is too complicated or requires too many input variables. Using the mechanistic BLM framework to first identify key water chemistry parameters that influence Cu bioavailability, namely dissolved organic carbon (DOC), pH, and hardness, we developed Cu criteria using the same basic methodology used by the USEPA to derive hardness-based criteria but with the addition of DOC and pH. As an initial proof of concept, we developed stepwise multiple linear regression (MLR) models for species that have been tested over wide ranges of DOC, pH, and hardness conditions. These models predicted acute Cu toxicity values that were within a factor of ±2 in 77% to 97% of tests (5 species had adequate data) and chronic Cu toxicity values that were within a factor of ±2 in 92% of tests (1 species had adequate data). This level of accuracy is comparable to the BLM. Following USEPA guidelines for WQC development, the species data were then combined to develop a linear model with pooled slopes for each independent parameter (i.e., DOC, pH, and hardness) and species-specific intercepts using Analysis of Covariance. The pooled MLR and BLM models predicted species-specific toxicity with similar precision; adjusted R 2 and R 2 values ranged from 0.56 to 0.86 and 0.66-0.85, respectively. Graphical exploration of relationships between predicted and observed toxicity, residuals and observed toxicity, and residuals and concentrations of key input parameters revealed many similarities and a few key distinctions between the

  5. A flexible mixed-effect negative binomial regression model for detecting unusual increases in MRI lesion counts in individual multiple sclerosis patients.

    Science.gov (United States)

    Kondo, Yumi; Zhao, Yinshan; Petkau, John

    2015-06-15

    We develop a new modeling approach to enhance a recently proposed method to detect increases of contrast-enhancing lesions (CELs) on repeated magnetic resonance imaging, which have been used as an indicator for potential adverse events in multiple sclerosis clinical trials. The method signals patients with unusual increases in CEL activity by estimating the probability of observing CEL counts as large as those observed on a patient's recent scans conditional on the patient's CEL counts on previous scans. This conditional probability index (CPI), computed based on a mixed-effect negative binomial regression model, can vary substantially depending on the choice of distribution for the patient-specific random effects. Therefore, we relax this parametric assumption to model the random effects with an infinite mixture of beta distributions, using the Dirichlet process, which effectively allows any form of distribution. To our knowledge, no previous literature considers a mixed-effect regression for longitudinal count variables where the random effect is modeled with a Dirichlet process mixture. As our inference is in the Bayesian framework, we adopt a meta-analytic approach to develop an informative prior based on previous clinical trials. This is particularly helpful at the early stages of trials when less data are available. Our enhanced method is illustrated with CEL data from 10 previous multiple sclerosis clinical trials. Our simulation study shows that our procedure estimates the CPI more accurately than parametric alternatives when the patient-specific random effect distribution is misspecified and that an informative prior improves the accuracy of the CPI estimates. Copyright © 2015 John Wiley & Sons, Ltd.

  6. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    Science.gov (United States)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  7. Estimating solid waste generation by hospitality industry during major festivals: A quantification model based on multiple regression.

    Science.gov (United States)

    Abdulredha, Muhammad; Al Khaddar, Rafid; Jordan, David; Kot, Patryk; Abdulridha, Ali; Hashim, Khalid

    2018-04-26

    Major-religious festivals hosted in the city of Kerbala, Iraq, annually generate large quantities of Municipal Solid Waste (MSW) which negatively impacts the environment and human health when poorly managed. The hospitality sector, specifically hotels, is one of the major sources of MSW generated during these festivals. Because it is essential to establish a proper waste management system for such festivals, accurate information regarding MSW generation is required. This study therefore investigated the rate of production of MSW from hotels in Kerbala during major festivals. A field questionnaire survey was conducted with 150 hotels during the Arba'een festival, one of the largest festivals in the world, attended by about 18 million participants, to identify how much MSW is produced and what features of hotels impact on this. Hotel managers responded to questions regarding features of the hotel such as size (Hs), expenditure (Hex), area (Ha) and number of staff (Hst). An on-site audit was also carried out with all participating hotels to estimate the mass of MSW generated from these hotels. The results indicate that MSW produced by hotels varies widely. In general, it was found that each hotel guest produces an estimated 0.89 kg of MSW per day. However, this figure varies according to the hotels' rating. Average rates of MSW production from one and four star hotels were 0.83 and 1.22 kg per guest per day, respectively. Statistically, it was found that the relationship between MSW production and hotel features can be modelled with an R 2 of 0.799, where the influence of hotel feature on MSW production followed the order Hs > Hex > Hst. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Applying Least Absolute Shrinkage Selection Operator and Akaike Information Criterion Analysis to Find the Best Multiple Linear Regression Models between Climate Indices and Components of Cow's Milk.

    Science.gov (United States)

    Marami Milani, Mohammad Reza; Hense, Andreas; Rahmani, Elham; Ploeger, Angelika

    2016-07-23

    This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new ), and respiratory rate predictor RRP) with three main components of cow's milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p -value < 0.001 and R ² (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation ( p -value < 0.001) with R ² (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

  9. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  10. Regression Models for Repairable Systems

    Czech Academy of Sciences Publication Activity Database

    Novák, Petr

    2015-01-01

    Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf

  11. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Analysis and Modeling for China’s Electricity Demand Forecasting Using a Hybrid Method Based on Multiple Regression and Extreme Learning Machine: A View from Carbon Emission

    Directory of Open Access Journals (Sweden)

    Yi Liang

    2016-11-01

    Full Text Available The power industry is the main battlefield of CO2 emission reduction, which plays an important role in the implementation and development of the low carbon economy. The forecasting of electricity demand can provide a scientific basis for the country to formulate a power industry development strategy and further promote the sustained, healthy and rapid development of the national economy. Under the goal of low-carbon economy, medium and long term electricity demand forecasting will have very important practical significance. In this paper, a new hybrid electricity demand model framework is characterized as follows: firstly, integration of grey relation degree (GRD with induced ordered weighted harmonic averaging operator (IOWHA to propose a new weight determination method of hybrid forecasting model on basis of forecasting accuracy as induced variables is presented; secondly, utilization of the proposed weight determination method to construct the optimal hybrid forecasting model based on extreme learning machine (ELM forecasting model and multiple regression (MR model; thirdly, three scenarios in line with the level of realization of various carbon emission targets and dynamic simulation of effect of low-carbon economy on future electricity demand are discussed. The resulting findings show that, the proposed model outperformed and concentrated some monomial forecasting models, especially in boosting the overall instability dramatically. In addition, the development of a low-carbon economy will increase the demand for electricity, and have an impact on the adjustment of the electricity demand structure.

  13. QSRR modeling for the chromatographic retention behavior of some β-lactam antibiotics using forward and firefly variable selection algorithms coupled with multiple linear regression.

    Science.gov (United States)

    Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M

    2018-05-11

    The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Chau-Kuang Chen

    2015-02-01

    Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.

  15. Prevendo a demanda de ligações em um call center por meio de um modelo de Regressão Múltipla Forecasting a call center demand using a Multiple Regression model

    Directory of Open Access Journals (Sweden)

    Marco Aurélio Carino Bouzada

    2009-09-01

    Full Text Available Este trabalho descreve - por meio do estudo de um caso - o problema da previsão de demanda de chamadas para um determinado produto no call center de uma grande empresa brasileira do setor - a Contax - e como ele foi abordado com o uso de Regressão Múltipla com variáveis dummy. Depois de destacar e justificar a importância do tema, o estudo apresenta uma breve revisão de literatura acerca de métodos de previsão de demanda e de sua aplicação em call centers. O caso é descrito, contextualizando, inicialmente, a empresa estudada e descrevendo, a seguir, a forma como ela lida com o problema de previsão de demanda de chamadas para o produto 103 - serviços relacionados à telefonia fixa. Um modelo de Regressão Múltipla com variáveis dummy é, então, desenvolvido para servir como base do processo de previsão de demanda proposto. Este modelo utiliza informações disponíveis capazes de influenciar a demanda, tais como o dia da semana, a ocorrência ou não de feriado e a proximidade da data com eventos críticos, como a chegada da conta à residência do cliente e seu vencimento; e apresentou ganhos de acurácia da ordem de 3 pontos percentuais para o período estudado, quando comparado com a ferramenta anteriormente em uso.This work describes - with the aid of a case study -a demand forecast problem for a specific product reported to the call center of a large Brazilian company in an industry called Contax, and the way it was approached with the use of Multiple Regression using dummy variables. After highlighting and justifying the studied matter relevance, the article presents a small literature review regarding demand forecast methods and their use in the call center industry. The case is described presenting the studied company and the way it deals with the Forecasting Demand for a telephone all center regarding telephone services products. Therefore, a Multiple Regression with dummy variables model was developed to work as the

  16. Multiple linear regression model for bromate formation based on the survey data of source waters from geographically different regions across China.

    Science.gov (United States)

    Yu, Jianwei; Liu, Juan; An, Wei; Wang, Yongjing; Zhang, Junzhi; Wei, Wei; Su, Ming; Yang, Min

    2015-01-01

    A total of 86 source water samples from 38 cities across major watersheds of China were collected for a bromide (Br(-)) survey, and the bromate (BrO3 (-)) formation potentials (BFPs) of 41 samples with Br(-) concentration >20 μg L(-1) were evaluated using a batch ozonation reactor. Statistical analyses indicated that higher alkalinity, hardness, and pH of water samples could lead to higher BFPs, with alkalinity as the most important factor. Based on the survey data, a multiple linear regression (MLR) model including three parameters (alkalinity, ozone dose, and total organic carbon (TOC)) was established with a relatively good prediction performance (model selection criterion = 2.01, R (2) = 0.724), using logarithmic transformation of the variables. Furthermore, a contour plot was used to interpret the influence of alkalinity and TOC on BrO3 (-) formation with prediction accuracy as high as 71 %, suggesting that these two parameters, apart from ozone dosage, were the most important ones affecting the BFPs of source waters with Br(-) concentration >20 μg L(-1). The model could be a useful tool for the prediction of the BFPs of source water.

  17. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    Science.gov (United States)

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  18. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    Science.gov (United States)

    Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael

    2017-01-01

    Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness

  19. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    Directory of Open Access Journals (Sweden)

    Gerald Forkuor

    Full Text Available Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat, terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC, soil organic carbon (SOC and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR, random forest regression (RFR, support vector machine (SVM, stochastic gradient boosting (SGB-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices

  20. Metabolic activity of tree saps of different origin towards cultured human cells in the light of grade correspondence analysis and multiple regression modeling

    Directory of Open Access Journals (Sweden)

    Artur Wnorowski

    2017-06-01

    Full Text Available Tree saps are nourishing biological media commonly used for beverage and syrup production. Although the nutritional aspect of tree saps is widely acknowledged, the exact relationship between the sap composition, origin, and effect on the metabolic rate of human cells is still elusive. Thus, we collected saps from seven different tree species and conducted composition-activity analysis. Saps from trees of Betulaceae, but not from Salicaceae, Sapindaceae, nor Juglandaceae families, were increasing the metabolic rate of HepG2 cells, as measured using tetrazolium-based assay. Content of glucose, fructose, sucrose, chlorides, nitrates, sulphates, fumarates, malates, and succinates in sap samples varied across different tree species. Grade correspondence analysis clustered trees based on the saps’ chemical footprint indicating its usability in chemotaxonomy. Multiple regression modeling showed that glucose and fumarate present in saps from silver birch (Betula pendula Roth., black alder (Alnus glutinosa Gaertn., and European hornbeam (Carpinus betulus L. are positively affecting the metabolic activity of HepG2 cells.

  1. A review of the most relevant multiple regression models for sales forecasting in gas stations; Uma revisao dos principais modelos de regressao multipla para previsao de vendas de postos de combustiveis

    Energy Technology Data Exchange (ETDEWEB)

    Wanke, Peter [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Instituto de Pesquisa e Pos-Graduacao em Administracao de Empresas (COPPEAD). Centro de Estudos em Logistica

    2004-07-01

    In this paper, the most relevant multiple regression models for sales forecasting of gas stations, developed over the past ten years, are reviewed. The most significant variables related to gas station sales, the types of the multiple regression models (linear or non-linear), the most common uses in supporting decision making and its limits are presented. The predictive power of each model and its impact on decision-making, such as sensitivity analysis and confidence intervals for independent variables, are also commented. Four models are presented, based on studies conducted in South Africa, Portugal and Brazil. In conclusion, suggestions for future developments are presented based on past developments. (author)

  2. Development of a predictive model for lead, cadmium and fluorine soil-water partition coefficients using sparse multiple linear regression analysis.

    Science.gov (United States)

    Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi

    2017-11-01

    In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

    Science.gov (United States)

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-06-15

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.

  4. General Nature of Multicollinearity in Multiple Regression Analysis.

    Science.gov (United States)

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  5. Particulate matter and carbon monoxide multiple regression models using environmental characteristics in a high diesel-use area of Baguio City, Philippines

    Energy Technology Data Exchange (ETDEWEB)

    Cassidy, Brandon E.; Naeher, Luke P. [The University of Georgia (UGA), College of Public Health, Department of Environmental Health Science, Athens, Georgia, GA 30602-2102 (United States); Alabanza-Akers, Mary Anne [UGA, College of Environment and Design, Athens, Georgia (United States); Akers, Timothy A. [Kennesaw State University, WellStar College of Health and Human Services, Kennesaw, Georgia (United States); Hall, Daniel B. [UGA, Franklin College of Arts and Sciences, Department of Statistics, Athens, Georgia (United States); Ryan, P. Barry [Emory University, Rollins School of Public Health, Atlanta, Georgia (United States); Bayer, Charlene W. [Georgia Tech Research Institute, Atlanta, Georgia (United States)

    2007-08-01

    In Baguio City, Philippines, a mountainous city of 252,386 people where 61% of motor vehicles use diesel fuel, ambient particulate matter < 2.5 {mu}m (PM{sub 2.5}) and < 10 {mu}m (PM{sub 10}) in aerodynamic diameter and carbon monoxide (CO) were measured at 30 street-level locations for 15 min apiece during the early morning (4:50-6:30 am), morning rush hour (6:30-9:10 am) and afternoon rush hour (3:40-5:40 pm) in December 2004. Environmental observations (e.g. traffic-related variables, building/roadway designs, wind speed and direction, etc.) at each location were noted during each monitoring event. Multiple regression models were formulated to determine which pollution sources and environmental factors significantly affect ground-level PM{sub 2.5}, PM{sub 10} and CO concentrations. The models showed statistically significant relationships between traffic and early morning particulate air pollution [(PM{sub 2.5}p = 0.021) and PM{sub 10} (p = 0.048)], traffic and morning rush hour CO (p = 0.048), traffic and afternoon rush hour CO (p = 0.034) and wind and early morning CO (p 0.044). The mean early morning, street-level PM{sub 2.5} (110 {+-} 8 {mu}g/m{sup 3}; mean {+-} 1 standard error) was not significantly different (p-value > 0.05) from either rush hour PM{sub 2.5} concentration (morning = 98 {+-} 7 {mu}g/m{sup 3}; afternoon = 107 {+-} 5 {mu}g/m{sup 3}) due to nocturnal inversions in spite of a 100% increase in automotive density during rush hours. Early morning street-level CO (3.0 {+-} 1.7 ppm) differed from morning rush hour (4.1 {+-} 2.3 ppm) (p 0.039) and afternoon rush hour (4.5 {+-}2.2 ppm) (p = 0.007). Additionally, PM{sub 2.5}, PM{sub 10}, CO, nitrogen dioxide (NO{sub 2}) and select volatile organic compounds were continuously measured at a downtown, third-story monitoring station along a busy roadway for 11 days. Twenty-four-hour average ambient concentrations were: PM{sub 2.5} = 72.9 {+-} 21 {mu}g/m{sup 3}; CO = 2.61 {+-} 0.6 ppm; NO{sub 2} = 27

  6. Particulate matter and carbon monoxide multiple regression models using environmental characteristics in a high diesel-use area of Baguio City, Philippines

    International Nuclear Information System (INIS)

    Cassidy, Brandon E.; Naeher, Luke P.; Alabanza-Akers, Mary Anne; Akers, Timothy A.; Hall, Daniel B.; Ryan, P. Barry; Bayer, Charlene W.

    2007-01-01

    In Baguio City, Philippines, a mountainous city of 252,386 people where 61% of motor vehicles use diesel fuel, ambient particulate matter 2.5 ) and 10 ) in aerodynamic diameter and carbon monoxide (CO) were measured at 30 street-level locations for 15 min apiece during the early morning (4:50-6:30 am), morning rush hour (6:30-9:10 am) and afternoon rush hour (3:40-5:40 pm) in December 2004. Environmental observations (e.g. traffic-related variables, building/roadway designs, wind speed and direction, etc.) at each location were noted during each monitoring event. Multiple regression models were formulated to determine which pollution sources and environmental factors significantly affect ground-level PM 2.5 , PM 10 and CO concentrations. The models showed statistically significant relationships between traffic and early morning particulate air pollution [(PM 2.5 p = 0.021) and PM 10 (p = 0.048)], traffic and morning rush hour CO (p = 0.048), traffic and afternoon rush hour CO (p = 0.034) and wind and early morning CO (p 0.044). The mean early morning, street-level PM 2.5 (110 ± 8 μg/m 3 ; mean ± 1 standard error) was not significantly different (p-value > 0.05) from either rush hour PM 2.5 concentration (morning = 98 ± 7 μg/m 3 ; afternoon = 107 ± 5 μg/m 3 ) due to nocturnal inversions in spite of a 100% increase in automotive density during rush hours. Early morning street-level CO (3.0 ± 1.7 ppm) differed from morning rush hour (4.1 ± 2.3 ppm) (p 0.039) and afternoon rush hour (4.5 ±2.2 ppm) (p = 0.007). Additionally, PM 2.5 , PM 10 , CO, nitrogen dioxide (NO 2 ) and select volatile organic compounds were continuously measured at a downtown, third-story monitoring station along a busy roadway for 11 days. Twenty-four-hour average ambient concentrations were: PM 2.5 = 72.9 ± 21 μg/m 3 ; CO = 2.61 ± 0.6 ppm; NO 2 = 27.7 ± 1.6 ppb; benzene = 8.4 ± 1.4 μg/m 3 ; ethylbenzene = 4.6 ± 2.0 μg/m 3 ; p-xylene = 4.4 ± 1.9 μg/m 3 ; m-xylene = 10.2 ± 4

  7. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  8. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

    2016-01-01

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN

  9. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  10. Elliptical multiple-output quantile regression and convex optimization

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Šiman, Miroslav

    2016-01-01

    Roč. 109, č. 1 (2016), s. 232-237 ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.540, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf

  11. Use of Multiple Linear Regression Method for Modelling Seasonal Changes in Stable Isotopes of 18O and 2H in 30 Pouns in Gilan Province

    Directory of Open Access Journals (Sweden)

    M.A. Mousavi Shalmani

    2014-08-01

    Full Text Available In order to assessment of water quality and characterize seasonal variation in 18O and 2H in relation with different chemical and physiographical parameters and modelling of effective parameters, an study was conducted during 2010 to 2011 in 30 different ponds in the north of Iran. Samples were collected at three different seasons and analysed for chemical and isotopic components. Data shows that highest amounts of δ18O and δ2H were recorded in the summer (-1.15‰ and -12.11‰ and the lowest amounts were seen in the winter (-7.50‰ and -47.32‰ respectively. Data also reveals that there is significant increase in d-excess during spring and summer in ponds 20, 21, 22, 24, 25 and 26. We can conclude that residual surface runoff (from upper lands is an important source of water to transfer soluble salts in to these ponds. In this respect, high retention time may be the main reason for movements of light isotopes in to the ponds. This has led d-excess of pond 12 even greater in summer than winter. This could be an acceptable reason for ponds 25 and 26 (Siyahkal county with highest amount of d-excess and lowest amounts of δ18O and δ2H. It seems light water pumped from groundwater wells with minor source of salt (originated from sea deep percolation in to the ponds, could may be another reason for significant decrease in the heavy isotopes of water (18O and 2H for ponds 2, 12, 14 and 25 from spring to summer. Overall conclusion of multiple linear regression test indicate that firstly from 30 variables (under investigation only a few cases can be used for identifying of changes in 18O and 2H by applications. Secondly, among the variables (studied, phytoplankton content was a common factor for interpretation of 18O and 2H during spring and summer, and also total period (during a year. Thirdly, the use of water in the spring was recommended for sampling, for 18O and 2H interpretation compared with other seasons. This is because of function can be

  12. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  13. Gaussian Process Regression Model in Spatial Logistic Regression

    Science.gov (United States)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  14. [Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

    Science.gov (United States)

    Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

    In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.

  15. Overcoming multicollinearity in multiple regression using correlation coefficient

    Science.gov (United States)

    Zainodin, H. J.; Yap, S. J.

    2013-09-01

    Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.

  16. Research on Influence and Prediction Model of Urban Traffic Link Tunnel curvature on Fire Temperature Based on Pyrosim--SPSS Multiple Regression Analysis

    Science.gov (United States)

    Li, Xiao Ju; Yao, Kun; Dai, Jun Yu; Song, Yun Long

    2018-05-01

    The underground space, also known as the “fourth dimension” of the city, reflects the efficient use of urban development intensive. Urban traffic link tunnel is a typical underground limited-length space. Due to the geographical location, the special structure of space and the curvature of the tunnel, high-temperature smoke can easily form the phenomenon of “smoke turning” and the fire risk is extremely high. This paper takes an urban traffic link tunnel as an example to focus on the relationship between curvature and the temperature near the fire source, and use the pyrosim built different curvature fire model to analyze the influence of curvature on the temperature of the fire, then using SPSS Multivariate regression analysis simulate curvature of the tunnel and fire temperature data. Finally, a prediction model of urban traffic link tunnel curvature on fire temperature was proposed. The regression model analysis and test show that the curvature is negatively correlated with the tunnel temperature. This model is feasible and can provide a theoretical reference for the urban traffic link tunnel fire protection design and the preparation of the evacuation plan. And also, it provides some reference for other related curved tunnel curvature design and smoke control measures.

  17. MULGRES: a computer program for stepwise multiple regression analysis

    Science.gov (United States)

    A. Jeff Martin

    1971-01-01

    MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.

  18. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Science.gov (United States)

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

    2012-01-01

    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  19. Multiple regression for physiological data analysis: the problem of multicollinearity.

    Science.gov (United States)

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  20. A comparative study of multiple regression analysis and back ...

    Indian Academy of Sciences (India)

    Abhijit Sarkar

    artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arc welding ... Keywords. Submerged arc welding (SAW); multi-regression analysis (MRA); artificial neural network ..... Degree of freedom.

  1. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    International Nuclear Information System (INIS)

    Chan, Yea-Kuang; Tsai, Yu-Ching

    2017-01-01

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  2. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    Energy Technology Data Exchange (ETDEWEB)

    Chan, Yea-Kuang; Tsai, Yu-Ching [Institute of Nuclear Energy Research, Taoyuan City, Taiwan (China). Nuclear Engineering Division

    2017-03-15

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  3. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  4. Variable importance in latent variable regression models

    NARCIS (Netherlands)

    Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.

    2014-01-01

    The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable

  5. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Directory of Open Access Journals (Sweden)

    Qiutong Jin

    2016-06-01

    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  7. Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

    Science.gov (United States)

    McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

    2017-02-01

    Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.

  8. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  9. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  10. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  11. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  12. Weibull and lognormal Taguchi analysis using multiple linear regression

    International Nuclear Information System (INIS)

    Piña-Monarrez, Manuel R.; Ortiz-Yañez, Jesús F.

    2015-01-01

    The paper provides to reliability practitioners with a method (1) to estimate the robust Weibull family when the Taguchi method (TM) is applied, (2) to estimate the normal operational Weibull family in an accelerated life testing (ALT) analysis to give confidence to the extrapolation and (3) to perform the ANOVA analysis to both the robust and the normal operational Weibull family. On the other hand, because the Weibull distribution neither has the normal additive property nor has a direct relationship with the normal parameters (µ, σ), in this paper, the issues of estimating a Weibull family by using a design of experiment (DOE) are first addressed by using an L_9 (3"4) orthogonal array (OA) in both the TM and in the Weibull proportional hazard model approach (WPHM). Then, by using the Weibull/Gumbel and the lognormal/normal relationships and multiple linear regression, the direct relationships between the Weibull and the lifetime parameters are derived and used to formulate the proposed method. Moreover, since the derived direct relationships always hold, the method is generalized to the lognormal and ALT analysis. Finally, the method’s efficiency is shown through its application to the used OA and to a set of ALT data. - Highlights: • It gives the statistical relations and steps to use the Taguchi Method (TM) to analyze Weibull data. • It gives the steps to determine the unknown Weibull family to both the robust TM setting and the normal ALT level. • It gives a method to determine the expected lifetimes and to perform its ANOVA analysis in TM and ALT analysis. • It gives a method to give confidence to the extrapolation in an ALT analysis by using the Weibull family of the normal level.

  13. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    International Nuclear Information System (INIS)

    Bao Min; Shi Quanlin; Zhang Jiamei

    2004-01-01

    This paper describes the net peak counts calculating of nuclide 137 Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  14. Two SPSS programs for interpreting multiple regression results.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  15. Categorical regression dose-response modeling

    Science.gov (United States)

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  16. Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

    Science.gov (United States)

    Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

    2014-12-01

    This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.

  17. Interpret with caution: multicollinearity in multiple regression of cognitive data.

    Science.gov (United States)

    Morrison, Catriona M

    2003-08-01

    Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.

  18. A new multiple regression model to identify multi-family houses with a high prevalence of sick building symptoms "SBS", within the healthy sustainable house study in Stockholm (3H).

    Science.gov (United States)

    Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G

    2010-01-01

    The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.

  19. General Dimensional Multiple-Output Support Vector Regressions and Their Multiple Kernel Learning.

    Science.gov (United States)

    Chung, Wooyong; Kim, Jisu; Lee, Heejin; Kim, Euntai

    2015-11-01

    Support vector regression has been considered as one of the most important regression or function approximation methodologies in a variety of fields. In this paper, two new general dimensional multiple output support vector regressions (MSVRs) named SOCPL1 and SOCPL2 are proposed. The proposed methods are formulated in the dual space and their relationship with the previous works is clearly investigated. Further, the proposed MSVRs are extended into the multiple kernel learning and their training is implemented by the off-the-shelf convex optimization tools. The proposed MSVRs are applied to benchmark problems and their performances are compared with those of the previous methods in the experimental section.

  20. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    Science.gov (United States)

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  1. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Directory of Open Access Journals (Sweden)

    Eric Ghysels

    2016-08-01

    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  2. A Powerful Test for Comparing Multiple Regression Functions.

    Science.gov (United States)

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  3. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  4. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    Science.gov (United States)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  5. Regression Models For Multivariate Count Data.

    Science.gov (United States)

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  6. Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.

    Science.gov (United States)

    Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A

    2017-02-01

    In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).

  7. Testing homogeneity in Weibull-regression models.

    Science.gov (United States)

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  8. Interpreting Multiple Logistic Regression Coefficients in Prospective Observational Studies

    Science.gov (United States)

    1982-11-01

    prompted close examination of the issue at a workshop on hypertriglyceridemia where some of the cautions and perspectives given in this paper were...characteristics. If this is not the interest, then to isolate and-understand the effect of a characteris- tic on CHD when it could be one of several interacting...also easily extended to the case when several independent variables are modeled in a multiple logistic equation. In this instance, if xlx 2,..., x are

  9. Multiattribute shopping models and ridge regression analysis

    NARCIS (Netherlands)

    Timmermans, H.J.P.

    1981-01-01

    Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing

  10. Mixed-effects regression models in linguistics

    CERN Document Server

    Heylen, Kris; Geeraerts, Dirk

    2018-01-01

    When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed.  In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...

  11. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  12. Tools to support interpreting multiple regression in the face of multicollinearity.

    Science.gov (United States)

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  13. Detection of epistatic effects with logic regression and a classical linear regression model.

    Science.gov (United States)

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  14. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  15. Forecasting Ebola with a regression transmission model

    OpenAIRE

    Asher, Jason

    2017-01-01

    We describe a relatively simple stochastic model of Ebola transmission that was used to produce forecasts with the lowest mean absolute error among Ebola Forecasting Challenge participants. The model enabled prediction of peak incidence, the timing of this peak, and final size of the outbreak. The underlying discrete-time compartmental model used a time-varying reproductive rate modeled as a multiplicative random walk driven by the number of infectious individuals. This structure generalizes ...

  16. Time-localized wavelet multiple regression and correlation

    Science.gov (United States)

    Fernández-Macho, Javier

    2018-02-01

    This paper extends wavelet methodology to handle comovement dynamics of multivariate time series via moving weighted regression on wavelet coefficients. The concept of wavelet local multiple correlation is used to produce one single set of multiscale correlations along time, in contrast with the large number of wavelet correlation maps that need to be compared when using standard pairwise wavelet correlations with rolling windows. Also, the spectral properties of weight functions are investigated and it is argued that some common time windows, such as the usual rectangular rolling window, are not satisfactory on these grounds. The method is illustrated with a multiscale analysis of the comovements of Eurozone stock markets during this century. It is shown how the evolution of the correlation structure in these markets has been far from homogeneous both along time and across timescales featuring an acute divide across timescales at about the quarterly scale. At longer scales, evidence from the long-term correlation structure can be interpreted as stable perfect integration among Euro stock markets. On the other hand, at intramonth and intraweek scales, the short-term correlation structure has been clearly evolving along time, experiencing a sharp increase during financial crises which may be interpreted as evidence of financial 'contagion'.

  17. Influence diagnostics in meta-regression model.

    Science.gov (United States)

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  18. AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Н. Білак

    2012-04-01

    Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.

  19. Modeling oil production based on symbolic regression

    International Nuclear Information System (INIS)

    Yang, Guangfei; Li, Xianneng; Wang, Jianliang; Lian, Lian; Ma, Tieju

    2015-01-01

    Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak

  20. Forecasting Ebola with a regression transmission model

    Directory of Open Access Journals (Sweden)

    Jason Asher

    2018-03-01

    Full Text Available We describe a relatively simple stochastic model of Ebola transmission that was used to produce forecasts with the lowest mean absolute error among Ebola Forecasting Challenge participants. The model enabled prediction of peak incidence, the timing of this peak, and final size of the outbreak. The underlying discrete-time compartmental model used a time-varying reproductive rate modeled as a multiplicative random walk driven by the number of infectious individuals. This structure generalizes traditional Susceptible-Infected-Recovered (SIR disease modeling approaches and allows for the flexible consideration of outbreaks with complex trajectories of disease dynamics. Keywords: Ebola, Forecasting, Mathematical modeling, Bayesian inference

  1. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  2. Local bilinear multiple-output quantile/depth regression

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav

    2015-01-01

    Roč. 21, č. 3 (2015), s. 1435-1466 ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.372, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf

  3. Real estate value prediction using multivariate regression models

    Science.gov (United States)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  4. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  5. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  6. Confidence bands for inverse regression models

    International Nuclear Information System (INIS)

    Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo

    2010-01-01

    We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract

  7. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  8. Multitask Quantile Regression under the Transnormal Model.

    Science.gov (United States)

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2016-01-01

    We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.

  9. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.

  10. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  11. Crime Modeling using Spatial Regression Approach

    Science.gov (United States)

    Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.

    2018-01-01

    Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.

  12. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    Science.gov (United States)

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  13. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  14. Early cost estimating for road construction projects using multiple regression techniques

    Directory of Open Access Journals (Sweden)

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  15. Estimating Engineering and Manufacturing Development Cost Risk Using Logistic and Multiple Regression

    National Research Council Canada - National Science Library

    Bielecki, John

    2003-01-01

    .... Previous research has demonstrated the use of a two-step logistic and multiple regression methodology to predicting cost growth produces desirable results versus traditional single-step regression...

  16. Multiple regression analysis of Jominy hardenability data for boron treated steels

    International Nuclear Information System (INIS)

    Komenda, J.; Sandstroem, R.; Tukiainen, M.

    1997-01-01

    The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

  17. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  18. Mass estimation of loose parts in nuclear power plant based on multiple regression

    International Nuclear Information System (INIS)

    He, Yuanfeng; Cao, Yanlong; Yang, Jiangxin; Gan, Chunbiao

    2012-01-01

    According to the application of the Hilbert–Huang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal Hilbert–Huang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)

  19. Dynamic Optimization for IPS2 Resource Allocation Based on Improved Fuzzy Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Maokuan Zheng

    2017-01-01

    Full Text Available The study mainly focuses on resource allocation optimization for industrial product-service systems (IPS2. The development of IPS2 leads to sustainable economy by introducing cooperative mechanisms apart from commodity transaction. The randomness and fluctuation of service requests from customers lead to the volatility of IPS2 resource utilization ratio. Three basic rules for resource allocation optimization are put forward to improve system operation efficiency and cut unnecessary costs. An approach based on fuzzy multiple linear regression (FMLR is developed, which integrates the strength and concision of multiple linear regression in data fitting and factor analysis and the merit of fuzzy theory in dealing with uncertain or vague problems, which helps reduce those costs caused by unnecessary resource transfer. The iteration mechanism is introduced in the FMLR algorithm to improve forecasting accuracy. A case study of human resource allocation optimization in construction machinery industry is implemented to test and verify the proposed model.

  20. On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.

    Science.gov (United States)

    Pek, Jolynn; Chalmers, R Philip; Monette, Georges

    2016-01-01

    When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.

  1. Variable selection and model choice in geoadditive regression models.

    Science.gov (United States)

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  2. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    Science.gov (United States)

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  3. Hierarchical regression analysis in structural Equation Modeling

    NARCIS (Netherlands)

    de Jong, P.F.

    1999-01-01

    In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main

  4. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Directory of Open Access Journals (Sweden)

    Minh Vu Trieu

    2017-03-01

    Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  5. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Science.gov (United States)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  6. Variable selection in multiple linear regression: The influence of ...

    African Journals Online (AJOL)

    provide an indication of whether the fit of the selected model improves or ... and calculate M(−i); quantify the influence of case i in terms of a function, f(•), of M and ..... [21] Venter JH & Snyman JLJ, 1997, Linear model selection based on risk ...

  7. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    Science.gov (United States)

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  8. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    Science.gov (United States)

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  9. Poisson Mixture Regression Models for Heart Disease Prediction.

    Science.gov (United States)

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  10. hMuLab: A Biomedical Hybrid MUlti-LABel Classifier Based on Multiple Linear Regression.

    Science.gov (United States)

    Wang, Pu; Ge, Ruiquan; Xiao, Xuan; Zhou, Manli; Zhou, Fengfeng

    2017-01-01

    Many biomedical classification problems are multi-label by nature, e.g., a gene involved in a variety of functions and a patient with multiple diseases. The majority of existing classification algorithms assumes each sample with only one class label, and the multi-label classification problem remains to be a challenge for biomedical researchers. This study proposes a novel multi-label learning algorithm, hMuLab, by integrating both feature-based and neighbor-based similarity scores. The multiple linear regression modeling techniques make hMuLab capable of producing multiple label assignments for a query sample. The comparison results over six commonly-used multi-label performance measurements suggest that hMuLab performs accurately and stably for the biomedical datasets, and may serve as a complement to the existing literature.

  11. Modeling maximum daily temperature using a varying coefficient regression model

    Science.gov (United States)

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  12. Using Multiple Linear Regression Techniques to Quantify Carbon ...

    African Journals Online (AJOL)

    komla

    Process and statistical models of productivity, though useful, are often ... The carbon balance of terrestrial ecosystems is uncertain, in part due to discrepancies and errors in .... The ecological data were collected through field work to include both .... Computer-aided Multivariate Analysis, Life Learning Publications, Belmont,.

  13. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  14. Risk factors for low birth weight according to the multiple logistic regression model. A retrospective cohort study in José María Morelos municipality, Quintana Roo, Mexico.

    Science.gov (United States)

    Franco Monsreal, José; Tun Cobos, Miriam Del Ruby; Hernández Gómez, José Ricardo; Serralta Peraza, Lidia Esther Del Socorro

    2018-01-17

    Low birth weight has been an enigma for science over time. There have been many researches on its causes and its effects. Low birth weight is an indicator that predicts the probability of a child surviving. In fact, there is an exponential relationship between weight deficit, gestational age, and perinatal mortality. Multiple logistic regression is one of the most expressive and versatile statistical instruments available for the analysis of data in both clinical and epidemiology settings, as well as in public health. To assess in a multivariate fashion the importance of 17 independent variables in low birth weight (dependent variable) of children born in the Mayan municipality of José María Morelos, Quintana Roo, Mexico. Analytical observational epidemiological cohort study with retrospective temporality. Births that met the inclusion criteria occurred in the "Hospital Integral Jose Maria Morelos" of the Ministry of Health corresponding to the Maya municipality of Jose Maria Morelos during the period from August 1, 2014 to July 31, 2015. The total number of newborns recorded was 1,147; 84 of which (7.32%) had low birth weight. To estimate the independent association between the explanatory variables (potential risk factors) and the response variable, a multiple logistic regression analysis was performed using the IBM SPSS Statistics 22 software. In ascending numerical order values of odds ratio > 1 indicated the positive contribution of explanatory variables or possible risk factors: "unmarried" marital status (1.076, 95% confidence interval: 0.550 to 2.104); age at menarche ≤ 12 years (1.08, 95% confidence interval: 0.64 to 1.84); history of abortion(s) (1.14, 95% confidence interval: 0.44 to 2.93); maternal weight < 50 kg (1.51, 95% confidence interval: 0.83 to 2.76); number of prenatal consultations ≤ 5 (1.86, 95% confidence interval: 0.94 to 3.66); maternal age ≥ 36 years (3.5, 95% confidence interval: 0.40 to 30.47); maternal age ≤ 19 years (3

  15. Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits

    National Research Council Canada - National Science Library

    Gravier, Michael

    1999-01-01

    .... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...

  16. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  17. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Directory of Open Access Journals (Sweden)

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  18. The MIDAS Touch: Mixed Data Sampling Regression Models

    OpenAIRE

    Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen

    2004-01-01

    We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies. Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies. We examine the asymptotic properties of MIDAS regression estimation and compare it with traditional distributed lag models. MIDAS regressions have wide applicability in macroeconomics and �nance.

  19. Model selection in kernel ridge regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    2013-01-01

    Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

  20. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure.

    Science.gov (United States)

    Yoo, Yun Joo; Sun, Lei; Poirier, Julia G; Paterson, Andrew D; Bull, Shelley B

    2017-02-01

    By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. © 2016 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.

  1. A generalized additive regression model for survival times

    DEFF Research Database (Denmark)

    Scheike, Thomas H.

    2001-01-01

    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  2. A note on the use of multiple linear regression in molecular ecology.

    Science.gov (United States)

    Frasier, Timothy R

    2016-03-01

    Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information. © 2015 John Wiley & Sons Ltd.

  3. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  4. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  5. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  6. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

    Science.gov (United States)

    Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.

    2014-01-01

    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601

  7. Linear regression crash prediction models : issues and proposed solutions.

    Science.gov (United States)

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  8. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia; Rue, Haavard

    2018-01-01

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite

  9. Methods of Detecting Outliers in A Regression Analysis Model ...

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    2013-06-01

    Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.

  10. Using the multiple regression analysis with respect to ANOVA and 3D mapping to model the actual performance of PEM (proton exchange membrane) fuel cell at various operating conditions

    International Nuclear Information System (INIS)

    Al-Hadeethi, Farqad; Al-Nimr, Moh'd; Al-Safadi, Mohammad

    2015-01-01

    The performance of PEM (proton exchange membrane) fuel cell was experimentally investigated at three temperatures (30, 50 and 70 °C), four flow rates (5, 10, 15 and 20 ml/min) and two flow patterns (co-current and counter current) in order to generate two correlations using multiple regression analysis with respect to ANOVA. Results revealed that increasing the temperature for co-current and counter current flow patterns will increase both hydrogen and oxygen diffusivities, water management and membrane conductivity. The derived mathematical correlations and three dimensional mapping (i.e. surface response) for the co-current and countercurrent flow patterns showed that there is a clear interaction among the various variables (temperatures and flow rates). - Highlights: • Generating mathematical correlations using multiple regression analysis with respect to ANOVA for the performance of the PEM fuel cell. • Using the 3D mapping to diagnose the optimum performance of the PEM fuel cell at the given operating conditions. • Results revealed that increasing the flow rate had direct influence on the consumption of oxygen. • Results assured that increasing the temperature in co-current and counter current flow patterns increases the performance of PEM fuel cell.

  11. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    Science.gov (United States)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  12. Model Selection in Kernel Ridge Regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...

  13. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    Science.gov (United States)

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  14. Tightness of M-estimators for multiple linear regression in time series

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  15. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  16. Corporate prediction models, ratios or regression analysis?

    NARCIS (Netherlands)

    Bijnen, E.J.; Wijn, M.F.C.M.

    1994-01-01

    The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in

  17. STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...

    African Journals Online (AJOL)

    ... downstream Obigbo station show: consistent time-trends in degree of contamination; linear and non-linear relationships for water quality models against total dissolved solids (TDS), total suspended sediment (TSS), chloride, pH and sulphate; and non-linear relationship for streamflow and water quality transport models.

  18. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani

    2016-09-14

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.

  19. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  20. Linear Regression Models for Estimating True Subsurface ...

    Indian Academy of Sciences (India)

    47

    The objective is to minimize the processing time and computer memory required. 10 to carry out inversion .... to the mainland by two long bridges. .... term. In this approach, the model converges when the squared sum of the differences. 143.

  1. Alternative regression models to assess increase in childhood BMI

    OpenAIRE

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-01-01

    Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...

  2. Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

    Science.gov (United States)

    Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

    2015-12-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE

  3. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Hong-Juan Li

    2013-04-01

    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  4. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  5. Noninvasive spectral imaging of skin chromophores based on multiple regression analysis aided by Monte Carlo simulation

    Science.gov (United States)

    Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa

    2011-08-01

    In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.

  6. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  7. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    Science.gov (United States)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg

  8. Poisson Mixture Regression Models for Heart Disease Prediction

    Science.gov (United States)

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  9. Convergence diagnostics for Eigenvalue problems with linear regression model

    International Nuclear Information System (INIS)

    Shi, Bo; Petrovic, Bojan

    2011-01-01

    Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)

  10. QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

    Directory of Open Access Journals (Sweden)

    Adi Syahputra

    2014-03-01

    Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.

  11. Toward Customer-Centric Organizational Science: A Common Language Effect Size Indicator for Multiple Linear Regressions and Regressions With Higher-Order Terms.

    Science.gov (United States)

    Krasikova, Dina V; Le, Huy; Bachura, Eric

    2018-01-22

    To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  12. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  13. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis.

    Science.gov (United States)

    Khalil, Mohamed H; Shebl, Mostafa K; Kosba, Mohamed A; El-Sabrout, Karim; Zaki, Nesma

    2016-08-01

    This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens' eggs. Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.

  14. A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

    International Nuclear Information System (INIS)

    Middleton, G.W.; Thomson, W.H.; Davies, I.H.; Morgan, A.

    1989-01-01

    A technique for accurate background subtraction in 99 Tc m -DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)

  15. Characteristics and Properties of a Simple Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Kowal Robert

    2016-12-01

    Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.

  16. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments

    Directory of Open Access Journals (Sweden)

    Marjan Čeh

    2018-05-01

    Full Text Available The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008–2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1 the non-linear nature of the prediction assignment task; (2 input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3 the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R2 values, sales ratios, mean average percentage error (MAPE, coefficient of dispersion (COD revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.

  17. A generalized multivariate regression model for modelling ocean wave heights

    Science.gov (United States)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  18. Bayesian Regression of Thermodynamic Models of Redox Active Materials

    Energy Technology Data Exchange (ETDEWEB)

    Johnston, Katherine [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-09-01

    Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from the model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).

  19. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  20. A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra.

    Science.gov (United States)

    Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin

    2015-02-09

    The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Logistic regression and multiple classification analyses to explore risk factors of under-5 mortality in bangladesh

    International Nuclear Information System (INIS)

    Bhowmik, K.R.; Islam, S.

    2016-01-01

    Logistic regression (LR) analysis is the most common statistical methodology to find out the determinants of childhood mortality. However, the significant predictors cannot be ranked according to their influence on the response variable. Multiple classification (MC) analysis can be applied to identify the significant predictors with a priority index which helps to rank the predictors. The main objective of the study is to find the socio-demographic determinants of childhood mortality at neonatal, post-neonatal, and post-infant period by fitting LR model as well as to rank those through MC analysis. The study is conducted using the data of Bangladesh Demographic and Health Survey 2007 where birth and death information of children were collected from their mothers. Three dichotomous response variables are constructed from children age at death to fit the LR and MC models. Socio-economic and demographic variables significantly associated with the response variables separately are considered in LR and MC analyses. Both the LR and MC models identified the same significant predictors for specific childhood mortality. For both the neonatal and child mortality, biological factors of children, regional settings, and parents socio-economic status are found as 1st, 2nd, and 3rd significant groups of predictors respectively. Mother education and household environment are detected as major significant predictors of post-neonatal mortality. This study shows that MC analysis with or without LR analysis can be applied to detect determinants with rank which help the policy makers taking initiatives on a priority basis. (author)

  2. Predictors of postoperative outcomes of cubital tunnel syndrome treatments using multiple logistic regression analysis.

    Science.gov (United States)

    Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki

    2017-05-01

    This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  3. A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity

    Science.gov (United States)

    Martin, David

    2008-01-01

    This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…

  4. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  5. Application of range-test in multiple linear regression analysis in ...

    African Journals Online (AJOL)

    Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.

  6. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    International Nuclear Information System (INIS)

    Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui

    1991-01-01

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  7. Random regression models for detection of gene by environment interaction

    Directory of Open Access Journals (Sweden)

    Meuwissen Theo HE

    2007-02-01

    Full Text Available Abstract Two random regression models, where the effect of a putative QTL was regressed on an environmental gradient, are described. The first model estimates the correlation between intercept and slope of the random regression, while the other model restricts this correlation to 1 or -1, which is expected under a bi-allelic QTL model. The random regression models were compared to a model assuming no gene by environment interactions. The comparison was done with regards to the models ability to detect QTL, to position them accurately and to detect possible QTL by environment interactions. A simulation study based on a granddaughter design was conducted, and QTL were assumed, either by assigning an effect independent of the environment or as a linear function of a simulated environmental gradient. It was concluded that the random regression models were suitable for detection of QTL effects, in the presence and absence of interactions with environmental gradients. Fixing the correlation between intercept and slope of the random regression had a positive effect on power when the QTL effects re-ranked between environments.

  8. Tutorial on Using Regression Models with Count Outcomes Using R

    Directory of Open Access Journals (Sweden)

    A. Alexander Beaujean

    2016-02-01

    Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.

  9. Multiple regression analysis of anthropometric measurements influencing the cephalic index of male Japanese university students.

    Science.gov (United States)

    Hossain, Md Golam; Saw, Aik; Alam, Rashidul; Ohtsuki, Fumio; Kamarul, Tunku

    2013-09-01

    Cephalic index (CI), the ratio of head breadth to head length, is widely used to categorise human populations. The aim of this study was to access the impact of anthropometric measurements on the CI of male Japanese university students. This study included 1,215 male university students from Tokyo and Kyoto, selected using convenient sampling. Multiple regression analysis was used to determine the effect of anthropometric measurements on CI. The variance inflation factor (VIF) showed no evidence of a multicollinearity problem among independent variables. The coefficients of the regression line demonstrated a significant positive relationship between CI and minimum frontal breadth (p regression analysis showed a greater likelihood for minimum frontal breadth (p regression analysis revealed bizygomatic breadth, head circumference, minimum frontal breadth, head height and morphological facial height to be the best predictor craniofacial measurements with respect to CI. The results suggest that most of the variables considered in this study appear to influence the CI of adult male Japanese students.

  10. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both

  11. Correlation-regression model for physico-chemical quality of ...

    African Journals Online (AJOL)

    abusaad

    areas, suggesting that groundwater quality in urban areas is closely related with land use ... the ground water, with correlation and regression model is also presented. ...... WHO (World Health Organization) (1985). Health hazards from nitrates.

  12. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  13. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  14. The Application of Classical and Neural Regression Models for the Valuation of Residential Real Estate

    Directory of Open Access Journals (Sweden)

    Mach Łukasz

    2017-06-01

    Full Text Available The research process aimed at building regression models, which helps to valuate residential real estate, is presented in the following article. Two widely used computational tools i.e. the classical multiple regression and regression models of artificial neural networks were used in order to build models. An attempt to define the utilitarian usefulness of the above-mentioned tools and comparative analysis of them is the aim of the conducted research. Data used for conducting analyses refers to the secondary transactional residential real estate market.

  15. A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction.

    Science.gov (United States)

    Qiu, Shibin; Lane, Terran

    2009-01-01

    The cell defense mechanism of RNA interference has applications in gene function analysis and promising potentials in human disease therapy. To effectively silence a target gene, it is desirable to select appropriate initiator siRNA molecules having satisfactory silencing capabilities. Computational prediction for silencing efficacy of siRNAs can assist this screening process before using them in biological experiments. String kernel functions, which operate directly on the string objects representing siRNAs and target mRNAs, have been applied to support vector regression for the prediction and improved accuracy over numerical kernels in multidimensional vector spaces constructed from descriptors of siRNA design rules. To fully utilize information provided by string and numerical data, we propose to unify the two in a kernel feature space by devising a multiple kernel regression framework where a linear combination of the kernels is used. We formulate the multiple kernel learning into a quadratically constrained quadratic programming (QCQP) problem, which although yields global optimal solution, is computationally demanding and requires a commercial solver package. We further propose three heuristics based on the principle of kernel-target alignment and predictive accuracy. Empirical results demonstrate that multiple kernel regression can improve accuracy, decrease model complexity by reducing the number of support vectors, and speed up computational performance dramatically. In addition, multiple kernel regression evaluates the importance of constituent kernels, which for the siRNA efficacy prediction problem, compares the relative significance of the design rules. Finally, we give insights into the multiple kernel regression mechanism and point out possible extensions.

  16. Accounting for measurement error in log regression models with applications to accelerated testing.

    Directory of Open Access Journals (Sweden)

    Robert Richardson

    Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  17. Accounting for measurement error in log regression models with applications to accelerated testing.

    Science.gov (United States)

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  18. Application of random regression models to the genetic evaluation ...

    African Journals Online (AJOL)

    The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...

  19. The APT model as reduced-rank regression

    NARCIS (Netherlands)

    Bekker, P.A.; Dobbelstein, P.; Wansbeek, T.J.

    Integrating the two steps of an arbitrage pricing theory (APT) model leads to a reduced-rank regression (RRR) model. So the results on RRR can be used to estimate APT models, making estimation very simple. We give a succinct derivation of estimation of RRR, derive the asymptotic variance of RRR

  20. Multiple Linear Regression and Artificial Neural Network to Predict Blood Glucose in Overweight Patients.

    Science.gov (United States)

    Wang, J; Wang, F; Liu, Y; Xu, J; Lin, H; Jia, B; Zuo, W; Jiang, Y; Hu, L; Lin, F

    2016-01-01

    Overweight individuals are at higher risk for developing type II diabetes than the general population. We conducted this study to analyze the correlation between blood glucose and biochemical parameters, and developed a blood glucose prediction model tailored to overweight patients. A total of 346 overweight Chinese people patients ages 18-81 years were involved in this study. Their levels of fasting glucose (fs-GLU), blood lipids, and hepatic and renal functions were measured and analyzed by multiple linear regression (MLR). Based the MLR results, we developed a back propagation artificial neural network (BP-ANN) model by selecting tansig as the transfer function of the hidden layers nodes, and purelin for the output layer nodes, with training goal of 0.5×10(-5). There was significant correlation between fs-GLU with age, BMI, and blood biochemical indexes (P<0.05). The results of MLR analysis indicated that age, fasting alanine transaminase (fs-ALT), blood urea nitrogen (fs-BUN), total protein (fs-TP), uric acid (fs-BUN), and BMI are 6 independent variables related to fs-GLU. Based on these parameters, the BP-ANN model was performed well and reached high prediction accuracy when training 1 000 epoch (R=0.9987). The level of fs-GLU was predictable using the proposed BP-ANN model based on 6 related parameters (age, fs-ALT, fs-BUN, fs-TP, fs-UA and BMI) in overweight patients. © Georg Thieme Verlag KG Stuttgart · New York.

  1. Sintering equation: determination of its coefficients by experiments - using multiple regression

    International Nuclear Information System (INIS)

    Windelberg, D.

    1999-01-01

    Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)

  2. Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

    Science.gov (United States)

    Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.

    2012-01-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…

  3. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  4. Alternative regression models to assess increase in childhood BMI.

    Science.gov (United States)

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-09-08

    Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  5. Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

    Science.gov (United States)

    Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

    2012-02-01

    In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.

  6. Robust mislabel logistic regression without modeling mislabel probabilities.

    Science.gov (United States)

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  7. Sagittal and Vertical Craniofacial Growth Pattern and Timing of Circumpubertal Skeletal Maturation: A Multiple Regression Study

    Directory of Open Access Journals (Sweden)

    Giuseppe Perinetti

    2016-01-01

    Full Text Available The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males were included in the study (mean age, 12.3±1.7 years; range, 7.6–16.7 years. These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles with age of attainment of the corresponding CVM stage (in months. Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, −0.7. These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases.

  8. A simplified calculation procedure for mass isotopomer distribution analysis (MIDA) based on multiple linear regression.

    Science.gov (United States)

    Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio

    2016-10-01

    We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two 13 C atoms ( 13 C 2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of 13 C 2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% 13 C 2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Sagittal and Vertical Craniofacial Growth Pattern and Timing of Circumpubertal Skeletal Maturation: A Multiple Regression Study

    Science.gov (United States)

    Rosso, Luigi; Riatti, Riccardo

    2016-01-01

    The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males) were included in the study (mean age, 12.3 ± 1.7 years; range, 7.6–16.7 years). These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM) stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles) with age of attainment of the corresponding CVM stage (in months). Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, −0.7). These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases. PMID:27995136

  10. Single image super-resolution using locally adaptive multiple linear regression.

    Science.gov (United States)

    Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki

    2015-12-01

    This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature.

  11. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Science.gov (United States)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  12. MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY

    OpenAIRE

    Chayalakshmi C.L

    2018-01-01

    MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...

  13. Choosing of mode and calculation of multiple regression equation parameters in X-ray radiometric analysis

    International Nuclear Information System (INIS)

    Mamikonyan, S.V.; Berezkin, V.V.; Lyubimova, S.V.; Svetajlo, Yu.N.; Shchekin, K.I.

    1978-01-01

    A method to derive multiple regression equations for X-ray radiometric analysis is described. Te method is realized in the form of the REGRA program in an algorithmic language. The subprograms included in the program are describe. In analyzing cement for Mg, Al, Si, Ca and Fe contents as an example, the obtainment of working equations in the course of calculations by the program is shown to simpliy the realization of computing devices in instruments for X-ray radiometric analysis

  14. [Multiple linear regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis].

    Science.gov (United States)

    Ma, Yu-Feng; Wang, Qing-Fu; Chen, Zhao-Jun; Du, Chun-Lin; Li, Jun-Hai; Huang, Hu; Shi, Zong-Ting; Yin, Yue-Shan; Zhang, Lei; A-Di, Li-Jiang; Dong, Shi-Yu; Wu, Ji

    2012-05-01

    To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. From March 2011 to July 2011, 140 patients (250 knees) were reviewed, including 132 knees in the left and 118 knees in the right; ranging in age from 40 to 71 years, with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle, joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (Pregression equation of lateral X-ray value and WOMAC scores (P>0.05). 1) X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. 2) It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. 3) The correlation between tibial angle,joint gap angle on antero-posterior X-ray and WOMAC scores is significant, which can be used to assess the functional recovery of patients before and after treatment.

  15. Linear regression models for quantitative assessment of left ...

    African Journals Online (AJOL)

    Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...

  16. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  17. Physics constrained nonlinear regression models for time series

    International Nuclear Information System (INIS)

    Majda, Andrew J; Harlim, John

    2013-01-01

    A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)

  18. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    OpenAIRE

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...

  19. Relationships between the structure of wheat gluten and ACE inhibitory activity of hydrolysate: stepwise multiple linear regression analysis.

    Science.gov (United States)

    Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan

    2016-08-01

    Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  20. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  1. Seasonal Variability of Aragonite Saturation State in the North Pacific Ocean Predicted by Multiple Linear Regression

    Science.gov (United States)

    Kim, T. W.; Park, G. H.

    2014-12-01

    Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal

  2. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    Science.gov (United States)

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  3. Forecasting daily meteorological time series using ARIMA and regression models

    Science.gov (United States)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  4. Thermal Efficiency Degradation Diagnosis Method Using Regression Model

    International Nuclear Information System (INIS)

    Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

    2011-01-01

    This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant

  5. Flexible competing risks regression modeling and goodness-of-fit

    DEFF Research Database (Denmark)

    Scheike, Thomas; Zhang, Mei-Jie

    2008-01-01

    In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...... of the flexible regression models to analyze competing risks data when non-proportionality is present in the data....

  6. The art of regression modeling in road safety

    CERN Document Server

    Hauer, Ezra

    2015-01-01

    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  7. Model building strategy for logistic regression: purposeful selection.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  8. Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

    Science.gov (United States)

    Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

    2017-09-01

    To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  9. Regression analysis of a chemical reaction fouling model

    International Nuclear Information System (INIS)

    Vasak, F.; Epstein, N.

    1996-01-01

    A previously reported mathematical model for the initial chemical reaction fouling of a heated tube is critically examined in the light of the experimental data for which it was developed. A regression analysis of the model with respect to that data shows that the reference point upon which the two adjustable parameters of the model were originally based was well chosen, albeit fortuitously. (author). 3 refs., 2 tabs., 2 figs

  10. Inverse estimation of multiple muscle activations based on linear logistic regression.

    Science.gov (United States)

    Sekiya, Masashi; Tsuji, Toshiaki

    2017-07-01

    This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.

  11. Spatial stochastic regression modelling of urban land use

    International Nuclear Information System (INIS)

    Arshad, S H M; Jaafar, J; Abiden, M Z Z; Latif, Z A; Rasam, A R A

    2014-01-01

    Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable

  12. Logistic regression for risk factor modelling in stuttering research.

    Science.gov (United States)

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  13. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    Science.gov (United States)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  14. Multiple Additive Regression Trees a Methodology for Predictive Data Mining for Fraud Detection

    National Research Council Canada - National Science Library

    da

    2002-01-01

    ...) is using new and innovative techniques for fraud detection. Their primary techniques for fraud detection are the data mining tools of classification trees and neural networks as well as methods for pooling the results of multiple model fits...

  15. Single-electron multiplication statistics as a combination of Poissonian pulse height distributions using constraint regression methods

    International Nuclear Information System (INIS)

    Ballini, J.-P.; Cazes, P.; Turpin, P.-Y.

    1976-01-01

    Analysing the histogram of anode pulse amplitudes allows a discussion of the hypothesis that has been proposed to account for the statistical processes of secondary multiplication in a photomultiplier. In an earlier work, good agreement was obtained between experimental and reconstructed spectra, assuming a first dynode distribution including two Poisson distributions of distinct mean values. This first approximation led to a search for a method which could give the weights of several Poisson distributions of distinct mean values. Three methods have been briefly exposed: classical linear regression, constraint regression (d'Esopo's method), and regression on variables subject to error. The use of these methods gives an approach of the frequency function which represents the dispersion of the punctual mean gain around the whole first dynode mean gain value. Comparison between this function and the one employed in Polya distribution allows the statement that the latter is inadequate to describe the statistical process of secondary multiplication. Numerous spectra obtained with two kinds of photomultiplier working under different physical conditions have been analysed. Then two points are discussed: - Does the frequency function represent the dynode structure and the interdynode collection process. - Is the model (the multiplication process of all dynodes but the first one, is Poissonian) valid whatever the photomultiplier and the utilization conditions. (Auth.)

  16. Modeling and prediction of flotation performance using support vector regression

    Directory of Open Access Journals (Sweden)

    Despotović Vladimir

    2017-01-01

    Full Text Available Continuous efforts have been made in recent year to improve the process of paper recycling, as it is of critical importance for saving the wood, water and energy resources. Flotation deinking is considered to be one of the key methods for separation of ink particles from the cellulose fibres. Attempts to model the flotation deinking process have often resulted in complex models that are difficult to implement and use. In this paper a model for prediction of flotation performance based on Support Vector Regression (SVR, is presented. Representative data samples were created in laboratory, under a variety of practical control variables for the flotation deinking process, including different reagents, pH values and flotation residence time. Predictive model was created that was trained on these data samples, and the flotation performance was assessed showing that Support Vector Regression is a promising method even when dataset used for training the model is limited.

  17. Bayesian approach to errors-in-variables in regression models

    Science.gov (United States)

    Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad

    2017-05-01

    In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.

  18. Research on the multiple linear regression in non-invasive blood glucose measurement.

    Science.gov (United States)

    Zhu, Jianming; Chen, Zhencheng

    2015-01-01

    A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.

  19. APPLICATION OF MULTIPLE LOGISTIC REGRESSION, BAYESIAN LOGISTIC AND CLASSIFICATION TREE TO IDENTIFY THE SIGNIFICANT FACTORS INFLUENCING CRASH SEVERITY

    Directory of Open Access Journals (Sweden)

    MILAD TAZIK

    2017-11-01

    Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.

  20. Time series regression model for infectious disease and weather.

    Science.gov (United States)

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Variable Selection for Regression Models of Percentile Flows

    Science.gov (United States)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high

  2. Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    Teräsvirta, Timo; Yang, Yukai

    The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...

  3. Application of multilinear regression analysis in modeling of soil ...

    African Journals Online (AJOL)

    The application of Multi-Linear Regression Analysis (MLRA) model for predicting soil properties in Calabar South offers a technical guide and solution in foundation designs problems in the area. Forty-five soil samples were collected from fifteen different boreholes at a different depth and 270 tests were carried out for CBR, ...

  4. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2009-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  5. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2010-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  6. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2011-01-01

    In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator

  7. A binary logistic regression model with complex sampling design of ...

    African Journals Online (AJOL)

    2017-09-03

    Sep 3, 2017 ... Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. .... Data was entered into STATA-12 and analyzed using. SPSS-21. .... lack of access/too far or costs too much. 35. 1.2.

  8. Transpiration of glasshouse rose crops: evaluation of regression models

    NARCIS (Netherlands)

    Baas, R.; Rijssel, van E.

    2006-01-01

    Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a

  9. Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

    Science.gov (United States)

    Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...

  10. Multiple Logistic Regression Analysis of Cigarette Use among High School Students

    Science.gov (United States)

    Adwere-Boamah, Joseph

    2011-01-01

    A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…

  11. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  12. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    DEFF Research Database (Denmark)

    Dyrholm, Mads; Hansen, Lars Kai

    2004-01-01

    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least square...... estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording....

  13. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  14. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Directory of Open Access Journals (Sweden)

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  15. Detection of Outliers in Regression Model for Medical Data

    Directory of Open Access Journals (Sweden)

    Stephen Raj S

    2017-07-01

    Full Text Available In regression analysis, an outlier is an observation for which the residual is large in magnitude compared to other observations in the data set. The detection of outliers and influential points is an important step of the regression analysis. Outlier detection methods have been used to detect and remove anomalous values from data. In this paper, we detect the presence of outliers in simple linear regression models for medical data set. Chatterjee and Hadi mentioned that the ordinary residuals are not appropriate for diagnostic purposes; a transformed version of them is preferable. First, we investigate the presence of outliers based on existing procedures of residuals and standardized residuals. Next, we have used the new approach of standardized scores for detecting outliers without the use of predicted values. The performance of the new approach was verified with the real-life data.

  16. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    International Nuclear Information System (INIS)

    Guercan, Oe.D.; Vermare, L.; Hennequin, P.; Bourdelle, C.

    2010-01-01

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  17. Multiple Regression Analysis of Unconfined Compression Strength of Mine Tailings Matrices

    Directory of Open Access Journals (Sweden)

    Mahmood Ali A.

    2017-01-01

    Full Text Available As part of a novel approach of sustainable development of mine tailings, experimental and numerical analysis is carried out on newly formulated tailings matrices. Several physical characteristic tests are carried out including the unconfined compression strength test to ascertain the integrity of these matrices when subjected to loading. The current paper attempts a multiple regression analysis of the unconfined compressive strength test results of these matrices to investigate the most pertinent factors affecting their strength. Results of this analysis showed that the suggested equation is reasonably applicable to the range of binder combinations used.

  18. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  19. Hierarchical Neural Regression Models for Customer Churn Prediction

    Directory of Open Access Journals (Sweden)

    Golshan Mohammadi

    2013-01-01

    Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.

  20. Electricity consumption forecasting in Italy using linear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio [DIAM, Seconda Universita degli Studi di Napoli, Via Roma 29, 81031 Aversa (CE) (Italy)

    2009-09-15

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of {+-}1% for the best case and {+-}11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  1. Electricity consumption forecasting in Italy using linear regression models

    International Nuclear Information System (INIS)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio

    2009-01-01

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  2. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...... the age-dependent or antagonistic pleiotropic effects. The models are applied to HFE genotype data to assess the effects on human longevity by different alleles and to detect if an age-dependent effect exists. Application has shown that these methods can serve as useful tools in searching for important...

  3. Interactions between cadmium and decabrominated diphenyl ether on blood cells count in rats-Multiple factorial regression analysis.

    Science.gov (United States)

    Curcic, Marijana; Buha, Aleksandra; Stankovic, Sanja; Milovanovic, Vesna; Bulat, Zorica; Đukić-Ćosić, Danijela; Antonijević, Evica; Vučinić, Slavica; Matović, Vesna; Antonijevic, Biljana

    2017-02-01

    The objective of this study was to assess toxicity of Cd and BDE-209 mixture on haematological parameters in subacutely exposed rats and to determine the presence and type of interactions between these two chemicals using multiple factorial regression analysis. Furthermore, for the assessment of interaction type, an isobologram based methodology was applied and compared with multiple factorial regression analysis. Chemicals were given by oral gavage to the male Wistar rats weighing 200-240g for 28days. Animals were divided in 16 groups (8/group): control vehiculum group, three groups of rats were treated with 2.5, 7.5 or 15mg Cd/kg/day. These doses were chosen on the bases of literature data and reflect relatively high Cd environmental exposure, three groups of rats were treated with 1000, 2000 or 4000mg BDE-209/kg/bw/day, doses proved to induce toxic effects in rats. Furthermore, nine groups of animals were treated with different mixtures of Cd and BDE-209 containing doses of Cd and BDE-209 stated above. Blood samples were taken at the end of experiment and red blood cells, white blood cells and platelets counts were determined. For interaction assessment multiple factorial regression analysis and fitted isobologram approach were used. In this study, we focused on multiple factorial regression analysis as a method for interaction assessment. We also investigated the interactions between Cd and BDE-209 by the derived model for the description of the obtained fitted isobologram curves. Current study indicated that co-exposure to Cd and BDE-209 can result in significant decrease in RBC count, increase in WBC count and decrease in PLT count, when compared with controls. Multiple factorial regression analysis used for the assessment of interactions type between Cd and BDE-209 indicated synergism for the effect on RBC count and no interactions i.e. additivity for the effects on WBC and PLT counts. On the other hand, isobologram based approach showed slight antagonism

  4. Interactions between cadmium and decabrominated diphenyl ether on blood cells count in rats—Multiple factorial regression analysis

    International Nuclear Information System (INIS)

    Curcic, Marijana; Buha, Aleksandra; Stankovic, Sanja; Milovanovic, Vesna; Bulat, Zorica; Đukić-Ćosić, Danijela; Antonijević, Evica; Vučinić, Slavica; Matović, Vesna; Antonijevic, Biljana

    2017-01-01

    The objective of this study was to assess toxicity of Cd and BDE-209 mixture on haematological parameters in subacutely exposed rats and to determine the presence and type of interactions between these two chemicals using multiple factorial regression analysis. Furthermore, for the assessment of interaction type, an isobologram based methodology was applied and compared with multiple factorial regression analysis. Chemicals were given by oral gavage to the male Wistar rats weighing 200–240 g for 28 days. Animals were divided in 16 groups (8/group): control vehiculum group, three groups of rats were treated with 2.5, 7.5 or 15 mg Cd/kg/day. These doses were chosen on the bases of literature data and reflect relatively high Cd environmental exposure, three groups of rats were treated with 1000, 2000 or 4000 mg BDE-209/kg/bw/day, doses proved to induce toxic effects in rats. Furthermore, nine groups of animals were treated with different mixtures of Cd and BDE-209 containing doses of Cd and BDE-209 stated above. Blood samples were taken at the end of experiment and red blood cells, white blood cells and platelets counts were determined. For interaction assessment multiple factorial regression analysis and fitted isobologram approach were used. In this study, we focused on multiple factorial regression analysis as a method for interaction assessment. We also investigated the interactions between Cd and BDE-209 by the derived model for the description of the obtained fitted isobologram curves. Current study indicated that co-exposure to Cd and BDE-209 can result in significant decrease in RBC count, increase in WBC count and decrease in PLT count, when compared with controls. Multiple factorial regression analysis used for the assessment of interactions type between Cd and BDE-209 indicated synergism for the effect on RBC count and no interactions i.e. additivity for the effects on WBC and PLT counts. On the other hand, isobologram based approach showed slight

  5. Two-step variable selection in quantile regression models

    Directory of Open Access Journals (Sweden)

    FAN Yali

    2015-06-01

    Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions, in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform ℓ1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.

  6. Screening for ketosis using multiple logistic regression based on milk yield and composition.

    Science.gov (United States)

    Kayano, Mitsunori; Kataoka, Tomoko

    2015-11-01

    Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (Pketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (Pketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.

  7. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    Science.gov (United States)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  8. New robust statistical procedures for the polytomous logistic regression models.

    Science.gov (United States)

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  9. The prediction of intelligence in preschool children using alternative models to regression.

    Science.gov (United States)

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  10. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE

    OpenAIRE

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-01-01

    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran?s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran?s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For pr...

  11. Online Statistical Modeling (Regression Analysis) for Independent Responses

    Science.gov (United States)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  12. A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

    Science.gov (United States)

    Bersabé, Rosa; Rivas, Teresa

    2010-05-01

    The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

  13. Multiple Imputation of Predictor Variables Using Generalized Additive Models

    NARCIS (Netherlands)

    de Jong, Roel; van Buuren, Stef; Spiess, Martin

    2016-01-01

    The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The

  14. Reconstruction of missing daily streamflow data using dynamic regression models

    Science.gov (United States)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  15. Geometrical model of multiple production

    International Nuclear Information System (INIS)

    Chikovani, Z.E.; Jenkovszky, L.L.; Kvaratshelia, T.M.; Struminskij, B.V.

    1988-01-01

    The relation between geometrical and KNO-scaling and their violation is studied in a geometrical model of multiple production of hadrons. Predictions concerning the behaviour of correlation coefficients at future accelerators are given

  16. Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...

  17. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Extended cox regression model: The choice of timefunction

    Science.gov (United States)

    Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu

    2017-07-01

    Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.

  19. A test of inflated zeros for Poisson regression models.

    Science.gov (United States)

    He, Hua; Zhang, Hui; Ye, Peng; Tang, Wan

    2017-01-01

    Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.

  20. Comparing the index-flood and multiple-regression methods using L-moments

    Science.gov (United States)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin

  1. Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

    Energy Technology Data Exchange (ETDEWEB)

    Bahk, Yong Whee; Park, Seog Hee; Kim, Sun Moo [St. Mary' s Hospital, Catholic Medical College, Seoul (Korea, Republic of)

    1981-09-15

    Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe

  2. Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

    International Nuclear Information System (INIS)

    Bahk, Yong Whee; Park, Seog Hee; Kim, Sun Moo

    1981-01-01

    Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe

  3. Multivariate Frequency-Severity Regression Models in Insurance

    Directory of Open Access Journals (Sweden)

    Edward W. Frees

    2016-02-01

    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  4. Augmented Beta rectangular regression models: A Bayesian perspective.

    Science.gov (United States)

    Wang, Jue; Luo, Sheng

    2016-01-01

    Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Bayesian semiparametric regression models to characterize molecular evolution

    Directory of Open Access Journals (Sweden)

    Datta Saheli

    2012-10-01

    Full Text Available Abstract Background Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments. Results The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties. Conclusions The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.

  6. Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

    Science.gov (United States)

    Bulcock, J. W.

    The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…

  7. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  8. Modeling the number of car theft using Poisson regression

    Science.gov (United States)

    Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura

    2016-10-01

    Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.

  9. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  10. Dynamic Regression Intervention Modeling for the Malaysian Daily Load

    Directory of Open Access Journals (Sweden)

    Fadhilah Abdrazak

    2014-05-01

    Full Text Available Malaysia is a unique country due to having both fixed and moving holidays.  These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors.  If these effects can be estimated and removed, the behavior of the series could be better viewed.  Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis.   Based on the linear transfer function method, a daily load model consists of either peak or average is developed.  The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.

  11. Learning Supervised Topic Models for Classification and Regression from Crowds.

    Science.gov (United States)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C

    2017-12-01

    The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.

  12. Continuous validation of ASTEC containment models and regression testing

    International Nuclear Information System (INIS)

    Nowack, Holger; Reinke, Nils; Sonnenkalb, Martin

    2014-01-01

    The focus of the ASTEC (Accident Source Term Evaluation Code) development at GRS is primarily on the containment module CPA (Containment Part of ASTEC), whose modelling is to a large extent based on the GRS containment code COCOSYS (COntainment COde SYStem). Validation is usually understood as the approval of the modelling capabilities by calculations of appropriate experiments done by external users different from the code developers. During the development process of ASTEC CPA, bugs and unintended side effects may occur, which leads to changes in the results of the initially conducted validation. Due to the involvement of a considerable number of developers in the coding of ASTEC modules, validation of the code alone, even if executed repeatedly, is not sufficient. Therefore, a regression testing procedure has been implemented in order to ensure that the initially obtained validation results are still valid with succeeding code versions. Within the regression testing procedure, calculations of experiments and plant sequences are performed with the same input deck but applying two different code versions. For every test-case the up-to-date code version is compared to the preceding one on the basis of physical parameters deemed to be characteristic for the test-case under consideration. In the case of post-calculations of experiments also a comparison to experimental data is carried out. Three validation cases from the regression testing procedure are presented within this paper. The very good post-calculation of the HDR E11.1 experiment shows the high quality modelling of thermal-hydraulics in ASTEC CPA. Aerosol behaviour is validated on the BMC VANAM M3 experiment, and the results show also a very good agreement with experimental data. Finally, iodine behaviour is checked in the validation test-case of the THAI IOD-11 experiment. Within this test-case, the comparison of the ASTEC versions V2.0r1 and V2.0r2 shows how an error was detected by the regression testing

  13. Comparison of multiple linear regression and artificial neural network in developing the objective functions of the orthopaedic screws.

    Science.gov (United States)

    Hsu, Ching-Chi; Lin, Jinn; Chao, Ching-Kong

    2011-12-01

    Optimizing the orthopaedic screws can greatly improve their biomechanical performances. However, a methodical design optimization approach requires a long time to search the best design. Thus, the surrogate objective functions of the orthopaedic screws should be accurately developed. To our knowledge, there is no study to evaluate the strengths and limitations of the surrogate methods in developing the objective functions of the orthopaedic screws. Three-dimensional finite element models for both the tibial locking screws and the spinal pedicle screws were constructed and analyzed. Then, the learning data were prepared according to the arrangement of the Taguchi orthogonal array, and the verification data were selected with use of a randomized selection. Finally, the surrogate objective functions were developed by using either the multiple linear regression or the artificial neural network. The applicability and accuracy of those surrogate methods were evaluated and discussed. The multiple linear regression method could successfully construct the objective function of the tibial locking screws, but it failed to develop the objective function of the spinal pedicle screws. The artificial neural network method showed a greater capacity of prediction in developing the objective functions for the tibial locking screws and the spinal pedicle screws than the multiple linear regression method. The artificial neural network method may be a useful option for developing the objective functions of the orthopaedic screws with a greater structural complexity. The surrogate objective functions of the orthopaedic screws could effectively decrease the time and effort required for the design optimization process. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  14. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Directory of Open Access Journals (Sweden)

    Campos-Aranda Daniel Francisco

    2014-10-01

    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  15. Genetic evaluation of European quails by random regression models

    Directory of Open Access Journals (Sweden)

    Flaviana Miranda Gonçalves

    2012-09-01

    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  16. Confidence intervals for distinguishing ordinal and disordinal interactions in multiple regression.

    Science.gov (United States)

    Lee, Sunbok; Lei, Man-Kit; Brody, Gene H

    2015-06-01

    Distinguishing between ordinal and disordinal interaction in multiple regression is useful in testing many interesting theoretical hypotheses. Because the distinction is made based on the location of a crossover point of 2 simple regression lines, confidence intervals of the crossover point can be used to distinguish ordinal and disordinal interactions. This study examined 2 factors that need to be considered in constructing confidence intervals of the crossover point: (a) the assumption about the sampling distribution of the crossover point, and (b) the possibility of abnormally wide confidence intervals for the crossover point. A Monte Carlo simulation study was conducted to compare 6 different methods for constructing confidence intervals of the crossover point in terms of the coverage rate, the proportion of true values that fall to the left or right of the confidence intervals, and the average width of the confidence intervals. The methods include the reparameterization, delta, Fieller, basic bootstrap, percentile bootstrap, and bias-corrected accelerated bootstrap methods. The results of our Monte Carlo simulation study suggest that statistical inference using confidence intervals to distinguish ordinal and disordinal interaction requires sample sizes more than 500 to be able to provide sufficiently narrow confidence intervals to identify the location of the crossover point. (c) 2015 APA, all rights reserved).

  17. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  18. The role of multiple regression and exploratory data analysis in the development of leukemia incidence risk models for comparison of radionuclide air stack emissions from nuclear and coal power industries

    International Nuclear Information System (INIS)

    Prybutok, V.R.

    1995-01-01

    Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)

  19. Multiple linear regression analysis of bacterial deposition to polyurethane coatings after conditioning film formation in the marine environment

    NARCIS (Netherlands)

    Bakker, D.P.; Busscher, H.J.; Zanten, J. van; Vries, J. de; Klijnstra, J.W.; Mei, H.C. van der

    2004-01-01

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial

  20. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  1. Multiple linear regression analysis of bacterial deposition to polyurethane coating after conditioning film formation in the marine environment

    NARCIS (Netherlands)

    Bakker, Dewi P; Busscher, Henk J; van Zanten, Joyce; de Vries, Jacob; Klijnstra, Job W; van der Mei, Henny C

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial

  2. Learning Supervised Topic Models for Classification and Regression from Crowds

    DEFF Research Database (Denmark)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete

    2017-01-01

    problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...... annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression...

  3. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...

  4. Predicting Performance on MOOC Assessments using Multi-Regression Models

    OpenAIRE

    Ren, Zhiyun; Rangwala, Huzefa; Johri, Aditya

    2016-01-01

    The past few years has seen the rapid growth of data min- ing approaches for the analysis of data obtained from Mas- sive Open Online Courses (MOOCs). The objectives of this study are to develop approaches to predict the scores a stu- dent may achieve on a given grade-related assessment based on information, considered as prior performance or prior ac- tivity in the course. We develop a personalized linear mul- tiple regression (PLMR) model to predict the grade for a student, prior to attempt...

  5. Analytical and regression models of glass rod drawing process

    Science.gov (United States)

    Alekseeva, L. B.

    2018-03-01

    The process of drawing glass rods (light guides) is being studied. The parameters of the process affecting the quality of the light guide have been determined. To solve the problem, mathematical models based on general equations of continuum mechanics are used. The conditions for the stable flow of the drawing process have been found, which are determined by the stability of the motion of the glass mass in the formation zone to small uncontrolled perturbations. The sensitivity of the formation zone to perturbations of the drawing speed and viscosity is estimated. Experimental models of the drawing process, based on the regression analysis methods, have been obtained. These models make it possible to customize a specific production process to obtain light guides of the required quality. They allow one to find the optimum combination of process parameters in the chosen area and to determine the required accuracy of maintaining them at a specified level.

  6. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...

  7. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    Science.gov (United States)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  8. Parametric optimization of multiple quality characteristics in laser cutting of Inconel-718 by using hybrid approach of multiple regression analysis and genetic algorithm

    Science.gov (United States)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-06-01

    Inconel-718 has found high demand in different industries due to their superior mechanical properties. The traditional cutting methods are facing difficulties for cutting these alloys due to their low thermal potential, lower elasticity and high chemical compatibility at inflated temperature. The challenges of machining and/or finishing of unusual shapes and/or sizes in these materials have also faced by traditional machining. Laser beam cutting may be applied for the miniaturization and ultra-precision cutting and/or finishing by appropriate control of different process parameter. This paper present multi-objective optimization the kerf deviation, kerf width and kerf taper in the laser cutting of Incone-718 sheet. The second order regression models have been developed for different quality characteristics by using the experimental data obtained through experimentation. The regression models have been used as objective function for multi-objective optimization based on the hybrid approach of multiple regression analysis and genetic algorithm. The comparison of optimization results to experimental results shows an improvement of 88%, 10.63% and 42.15% in kerf deviation, kerf width and kerf taper, respectively. Finally, the effects of different process parameters on quality characteristics have also been discussed.

  9. Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

    International Nuclear Information System (INIS)

    Xu Li; Liang Changhong; Xiao Yuanqiu; Zhang Zhonglin

    2010-01-01

    Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1 H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1 H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m 2 , linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)

  10. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    Science.gov (United States)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  11. Regression Models for Predicting Force Coefficients of Aerofoils

    Directory of Open Access Journals (Sweden)

    Mohammed ABDUL AKBAR

    2015-09-01

    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  12. Multiple Linear Regression Model for Estimating the Price of a ...

    African Journals Online (AJOL)

    Michael

    2, December, 2017 ... by log transformation of the data, ensuring the data is normally distributed and there is no correlation ... (Chaphalkar and Dhatunde, 2015) but the possible ...... Mathematical Sciences of the University of ... Management.

  13. Investigations upon the indefinite rolls quality assurance in multiple regression analysis

    Directory of Open Access Journals (Sweden)

    Kiss, I.

    2012-04-01

    Full Text Available The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers

  14. Least square regression based integrated multi-parameteric demand modeling for short term load forecasting

    International Nuclear Information System (INIS)

    Halepoto, I.A.; Uqaili, M.A.

    2014-01-01

    Nowadays, due to power crisis, electricity demand forecasting is deemed an important area for socioeconomic development and proper anticipation of the load forecasting is considered essential step towards efficient power system operation, scheduling and planning. In this paper, we present STLF (Short Term Load Forecasting) using multiple regression techniques (i.e. linear, multiple linear, quadratic and exponential) by considering hour by hour load model based on specific targeted day approach with temperature variant parameter. The proposed work forecasts the future load demand correlation with linear and non-linear parameters (i.e. considering temperature in our case) through different regression approaches. The overall load forecasting error is 2.98% which is very much acceptable. From proposed regression techniques, Quadratic Regression technique performs better compared to than other techniques because it can optimally fit broad range of functions and data sets. The work proposed in this paper, will pave a path to effectively forecast the specific day load with multiple variance factors in a way that optimal accuracy can be maintained. (author)

  15. Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks

    Science.gov (United States)

    Kanevski, Mikhail

    2015-04-01

    The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press

  16. Estimation of error components in a multi-error linear regression model, with an application to track fitting

    International Nuclear Information System (INIS)

    Fruehwirth, R.

    1993-01-01

    We present an estimation procedure of the error components in a linear regression model with multiple independent stochastic error contributions. After solving the general problem we apply the results to the estimation of the actual trajectory in track fitting with multiple scattering. (orig.)

  17. Conditional Monte Carlo randomization tests for regression models.

    Science.gov (United States)

    Parhat, Parwen; Rosenberger, William F; Diao, Guoqing

    2014-08-15

    We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.

  18. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  19. Global Land Use Regression Model for Nitrogen Dioxide Air Pollution.

    Science.gov (United States)

    Larkin, Andrew; Geddes, Jeffrey A; Martin, Randall V; Xiao, Qingyang; Liu, Yang; Marshall, Julian D; Brauer, Michael; Hystad, Perry

    2017-06-20

    Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the worldwide distribution of NO 2 exposure and associated impacts on health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO 2 ) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO 2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R 2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n = 10,000) demonstrated a robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R 2 within 2%) but not for Africa and Oceania (adjusted R 2 within 11%) where NO 2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO 2 concentrations. Variable contributions differed between continental regions, but major roads within 100 m and satellite-derived NO 2 were consistently the strongest predictors. The resulting model can be used for global risk assessments and health studies, particularly in countries without existing NO 2 monitoring data or models.

  20. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  1. Drought Patterns Forecasting using an Auto-Regressive Logistic Model

    Science.gov (United States)

    del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.

    2014-12-01

    Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.

  2. A Gompertz regression model for fern spores germination

    Directory of Open Access Journals (Sweden)

    Gabriel y Galán, Jose María

    2015-06-01

    Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados

  3. Collision prediction models using multivariate Poisson-lognormal regression.

    Science.gov (United States)

    El-Basyouny, Karim; Sayed, Tarek

    2009-07-01

    This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.

  4. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.

    Science.gov (United States)

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-10-01

    The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.

  5. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    NARCIS (Netherlands)

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

    2006-01-01

    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval

  6. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    Science.gov (United States)

    Ferrari, Alberto

    2017-01-01

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  7. Variable selection in Logistic regression model with genetic algorithm.

    Science.gov (United States)

    Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi

    2018-02-01

    Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

  8. Electricity prices forecasting by automatic dynamic harmonic regression models

    International Nuclear Information System (INIS)

    Pedregal, Diego J.; Trapero, Juan R.

    2007-01-01

    The changes experienced by electricity markets in recent years have created the necessity for more accurate forecast tools of electricity prices, both for producers and consumers. Many methodologies have been applied to this aim, but in the view of the authors, state space models are not yet fully exploited. The present paper proposes a univariate dynamic harmonic regression model set up in a state space framework for forecasting prices in these markets. The advantages of the approach are threefold. Firstly, a fast automatic identification and estimation procedure is proposed based on the frequency domain. Secondly, the recursive algorithms applied offer adaptive predictions that compare favourably with respect to other techniques. Finally, since the method is based on unobserved components models, explicit information about trend, seasonal and irregular behaviours of the series can be extracted. This information is of great value to the electricity companies' managers in order to improve their strategies, i.e. it provides management innovations. The good forecast performance and the rapid adaptability of the model to changes in the data are illustrated with actual prices taken from the PJM interconnection in the US and for the Spanish market for the year 2002. (author)

  9. [Multiple linear regression and ROC curve analysis of the factors of lumbar spine bone mineral density].

    Science.gov (United States)

    Zhang, Xiaodong; Zhao, Yinxia; Hu, Shaoyong; Hao, Shuai; Yan, Jiewen; Zhang, Lingyan; Zhao, Jing; Li, Shaolin

    2015-09-01

    To investigate the correlation between the lumbar vertebra bone mineral density (BMD) and age, gender, height, weight, body mass index, waistline, hipline, bone marrow and abdomen fat, and to explore the key factor affecting the BMD. A total of 72 cases were randomly recruited. All the subjects underwent a spectroscopic examination of the third lumber vertebra with single-voxel method in 1.5T MR. Lipid fractions (FF%) were measured. Quantitative CT were also performed to get the BMD of L3 and the corresponding abdomen subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). The statistical analysis were performed by SPSS 19.0. Multiple linear regression showed except the age and FF% showed significant difference (P0.05). The correlation of age and FF% with BMD was statistically negatively significant (r=-0.830, -0.521, P<0.05). The ROC curve analysis showed that the sensitivety and specificity of predicting osteoporosis were 81.8% and 86.9%, with a threshold of 58.5 years old. And it showed that the sensitivety and specificity of predicting osteoporosis were 90.9% and 55.7%, with a threshold of 52.8% for FF%. The lumbar vertebra BMD was significantly and negatively correlated with age and bone marrow FF%, but it was not significantly correlated with gender, height, weight, BMI, waistline, hipline, SAT and VAT. And age was the critical factor.

  10. Multiple regression as a preventive tool for determining the risk of Legionella spp.

    Directory of Open Access Journals (Sweden)

    Enrique Gea-Izquierdo

    2012-04-01

    Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.

  11. Influence of plant root morphology and tissue composition on phenanthrene uptake: Stepwise multiple linear regression analysis

    International Nuclear Information System (INIS)

    Zhan, Xinhua; Liang, Xiao; Xu, Guohua; Zhou, Lixiang

    2013-01-01

    Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake

  12. Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression

    Science.gov (United States)

    Beh, W. L.; Yong, M. K. Au

    2017-04-01

    This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.

  13. The R Package threg to Implement Threshold Regression Models

    Directory of Open Access Journals (Sweden)

    Tao Xiao

    2015-08-01

    This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.

  14. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Science.gov (United States)

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.

  15. Multiple linear regression to develop strength scaled equations for knee and elbow joints based on age, gender and segment mass

    DEFF Research Database (Denmark)

    D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar

    2012-01-01

    and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...... flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (PGender, thigh mass and age best...... predicted KPT (R2=0.60). Gender, forearm mass and age best predicted EPT (R2=0.75). Good crossvalidation was established for both elbow and knee models. Conclusion: This cross-sectional study of muscle strength created and validated strength scaled equations of EPT and KPT using only gender, segment mass...

  16. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  17. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    International Nuclear Information System (INIS)

    Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina

    2009-01-01

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

  18. Ultracentrifuge separative power modeling with multivariate regression using covariance matrix

    International Nuclear Information System (INIS)

    Migliavacca, Elder

    2004-01-01

    In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)

  19. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    Science.gov (United States)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  20. Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.

    Science.gov (United States)

    Choi, Jae-Seok; Kim, Munchurl

    2017-03-01

    Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower

  1. Short-term electricity prices forecasting based on support vector regression and Auto-regressive integrated moving average modeling

    International Nuclear Information System (INIS)

    Che Jinxing; Wang Jianzhou

    2010-01-01

    In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.

  2. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  3. Statistical Analysis of Reactor Pressure Vessel Fluence Calculation Benchmark Data Using Multiple Regression Techniques

    International Nuclear Information System (INIS)

    Carew, John F.; Finch, Stephen J.; Lois, Lambros

    2003-01-01

    The calculated >1-MeV pressure vessel fluence is used to determine the fracture toughness and integrity of the reactor pressure vessel. It is therefore of the utmost importance to ensure that the fluence prediction is accurate and unbiased. In practice, this assurance is provided by comparing the predictions of the calculational methodology with an extensive set of accurate benchmarks. A benchmarking database is used to provide an estimate of the overall average measurement-to-calculation (M/C) bias in the calculations ( ). This average is used as an ad-hoc multiplicative adjustment to the calculations to correct for the observed calculational bias. However, this average only provides a well-defined and valid adjustment of the fluence if the M/C data are homogeneous; i.e., the data are statistically independent and there is no correlation between subsets of M/C data.Typically, the identification of correlations between the errors in the database M/C values is difficult because the correlation is of the same magnitude as the random errors in the M/C data and varies substantially over the database. In this paper, an evaluation of a reactor dosimetry benchmark database is performed to determine the statistical validity of the adjustment to the calculated pressure vessel fluence. Physical mechanisms that could potentially introduce a correlation between the subsets of M/C ratios are identified and included in a multiple regression analysis of the M/C data. Rigorous statistical criteria are used to evaluate the homogeneity of the M/C data and determine the validity of the adjustment.For the database evaluated, the M/C data are found to be strongly correlated with dosimeter response threshold energy and dosimeter location (e.g., cavity versus in-vessel). It is shown that because of the inhomogeneity in the M/C data, for this database, the benchmark data do not provide a valid basis for adjusting the pressure vessel fluence.The statistical criteria and methods employed in

  4. An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions

    Energy Technology Data Exchange (ETDEWEB)

    Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.

    2016-07-01

    Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)

  5. Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models

    Directory of Open Access Journals (Sweden)

    Camerin Hahn

    2012-01-01

    Full Text Available As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007. However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed.

  6. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

    Directory of Open Access Journals (Sweden)

    Faridah Hani Mohamed Salleh

    2017-01-01

    Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  7. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems.

    Science.gov (United States)

    Salleh, Faridah Hani Mohamed; Zainudin, Suhaila; Arif, Shereena M

    2017-01-01

    Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  8. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  9. Soil organic carbon distribution in Mediterranean areas under a climate change scenario via multiple linear regression analysis.

    Science.gov (United States)

    Olaya-Abril, Alfonso; Parras-Alcántara, Luis; Lozano-García, Beatriz; Obregón-Romero, Rafael

    2017-08-15

    Over time, the interest on soil studies has increased due to its role in carbon sequestration in terrestrial ecosystems, which could contribute to decreasing atmospheric CO 2 rates. In many studies, independent variables were related to soil organic carbon (SOC) alone, however, the contribution degree of each variable with the experimentally determined SOC content were not considered. In this study, samples from 612 soil profiles were obtained in a natural protected (Red Natura 2000) of Sierra Morena (Mediterranean area, South Spain), considering only the topsoil 0-25cm, for better comparison between results. 24 independent variables were used to define it relationship with SOC content. Subsequently, using a multiple linear regression analysis, the effects of these variables on the SOC correlation was considered. Finally, the best parameters determined with the regression analysis were used in a climatic change scenario. The model indicated that SOC in a future scenario of climate change depends on average temperature of coldest quarter (41.9%), average temperature of warmest quarter (34.5%), annual precipitation (22.2%) and annual average temperature (1.3%). When the current and future situations were compared, the SOC content in the study area was reduced a 35.4%, and a trend towards migration to higher latitude and altitude was observed. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. application of multilinear regression analysis in modeling of soil

    African Journals Online (AJOL)

    Windows User

    Accordingly [1, 3] in their work, they applied linear regression ... (MLRA) is a statistical technique that uses several explanatory ... order to check this, they adopted bivariate correlation analysis .... groups, namely A-1 through A-7, based on their relative expected ..... Multivariate Regression in Gorgan Province North of Iran” ...

  11. Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

    Directory of Open Access Journals (Sweden)

    Yuehjen E. Shao

    2013-01-01

    Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.

  12. Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids.

    Science.gov (United States)

    Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C

    2012-09-21

    The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR) Model

    OpenAIRE

    Souvik Bhattacharyya; Gautam Sanyal

    2011-01-01

    The staggering growth in communication technologyand usage of public domain channels (i.e. Internet) has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence o...

  14. Application of regression model on stream water quality parameters

    International Nuclear Information System (INIS)

    Suleman, M.; Maqbool, F.; Malik, A.H.; Bhatti, Z.A.

    2012-01-01

    Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)

  15. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  16. Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral data

    Science.gov (United States)

    Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan

    2006-01-01

    We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...

  17. Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

    DEFF Research Database (Denmark)

    Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line

    Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent v...... in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis......Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk...

  18. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    Science.gov (United States)

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  19. A simulation study on Bayesian Ridge regression models for several collinearity levels

    Science.gov (United States)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  20. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

    Science.gov (United States)

    Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

    2015-05-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  1. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  2. Application of Multiple Evaluation Models in Brazil

    Directory of Open Access Journals (Sweden)

    Rafael Victal Saliba

    2008-07-01

    Full Text Available Based on two different samples, this article tests the performance of a number of Value Drivers commonly used for evaluating companies by finance practitioners, through simple regression models of cross-section type which estimate the parameters associated to each Value Driver, denominated Market Multiples. We are able to diagnose the behavior of several multiples in the period 1994-2004, with an outlook also on the particularities of the economic activities performed by the sample companies (and their impacts on the performance through a subsequent analysis with segregation of companies in the sample by sectors. Extrapolating simple multiples evaluation standards from analysts of the main financial institutions in Brazil, we find that adjusting the ratio formulation to allow for an intercept does not provide satisfactory results in terms of pricing errors reduction. Results found, in spite of evidencing certain relative and absolute superiority among the multiples, may not be generically representative, given samples limitation.

  3. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  4. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR).

    Science.gov (United States)

    Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam

    2016-01-01

    Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors . A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r(2), concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained.

  5. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  6. Direct modeling of regression effects for transition probabilities in the progressive illness-death model

    DEFF Research Database (Denmark)

    Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo

    2017-01-01

    In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...

  7. Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.

    Science.gov (United States)

    Ohring, G.

    1972-01-01

    Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.

  8. A logistic regression model for Ghana National Health Insurance claims

    Directory of Open Access Journals (Sweden)

    Samuel Antwi

    2013-07-01

    Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.

  9. Multiple injections of electroporated autologous T cells expressing a chimeric antigen receptor mediate regression of human disseminated tumor.

    Science.gov (United States)

    Zhao, Yangbing; Moon, Edmund; Carpenito, Carmine; Paulos, Chrystal M; Liu, Xiaojun; Brennan, Andrea L; Chew, Anne; Carroll, Richard G; Scholler, John; Levine, Bruce L; Albelda, Steven M; June, Carl H

    2010-11-15

    Redirecting T lymphocyte antigen specificity by gene transfer can provide large numbers of tumor-reactive T lymphocytes for adoptive immunotherapy. However, safety concerns associated with viral vector production have limited clinical application of T cells expressing chimeric antigen receptors (CAR). T lymphocytes can be gene modified by RNA electroporation without integration-associated safety concerns. To establish a safe platform for adoptive immunotherapy, we first optimized the vector backbone for RNA in vitro transcription to achieve high-level transgene expression. CAR expression and function of RNA-electroporated T cells could be detected up to a week after electroporation. Multiple injections of RNA CAR-electroporated T cells mediated regression of large vascularized flank mesothelioma tumors in NOD/scid/γc(-/-) mice. Dramatic tumor reduction also occurred when the preexisting intraperitoneal human-derived tumors, which had been growing in vivo for >50 days, were treated by multiple injections of autologous human T cells electroporated with anti-mesothelin CAR mRNA. This is the first report using matched patient tumor and lymphocytes showing that autologous T cells from cancer patients can be engineered to provide an effective therapy for a disseminated tumor in a robust preclinical model. Multiple injections of RNA-engineered T cells are a novel approach for adoptive cell transfer, providing flexible platform for the treatment of cancer that may complement the use of retroviral and lentiviral engineered T cells. This approach may increase the therapeutic index of T cells engineered to express powerful activation domains without the associated safety concerns of integrating viral vectors. Copyright © 2010 AACR.

  10. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    Science.gov (United States)

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  11. Sensitivity of Microstructural Factors Influencing the Impact Toughness of Hypoeutectoid Steels with Ferrite-Pearlite Structure using Multiple Regression Analysis

    International Nuclear Information System (INIS)

    Lee, Seung-Yong; Lee, Sang-In; Hwang, Byoung-chul

    2016-01-01

    In this study, the effect of microstructural factors on the impact toughness of hypoeutectoid steels with ferrite-pearlite structure was quantitatively investigated using multiple regression analysis. Microstructural analysis results showed that the pearlite fraction increased with increasing austenitizing temperature and decreasing transformation temperature which substantially decreased the pearlite interlamellar spacing and cementite thickness depending on carbon content. The impact toughness of hypoeutectoid steels usually increased as interlamellar spacing or cementite thickness decreased, although the impact toughness was largely associated with pearlite fraction. Based on these results, multiple regression analysis was performed to understand the individual effect of pearlite fraction, interlamellar spacing, and cementite thickness on the impact toughness. The regression analysis results revealed that pearlite fraction significantly affected impact toughness at room temperature, while cementite thickness did at low temperature.

  12. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    Science.gov (United States)

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  13. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  14. Oil and gas pipeline construction cost analysis and developing regression models for cost estimation

    Science.gov (United States)

    Thaduri, Ravi Kiran

    In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.

  15. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  16. Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan; Zvárová, Jana

    2017-01-01

    Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability

  17. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models

    Directory of Open Access Journals (Sweden)

    Nataša Šarlija

    2017-01-01

    Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.

  18. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  19. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    Science.gov (United States)

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  20. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  1. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

    Directory of Open Access Journals (Sweden)

    Soyoung Park

    2017-07-01

    Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

  2. Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2017-01-01

    In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...

  3. Semiparametric nonlinear quantile regression model for financial returns

    Czech Academy of Sciences Publication Activity Database

    Avdulaj, Krenar; Baruník, Jozef

    2017-01-01

    Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economic s OBOR OECD: Applied Economic s, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf

  4. Examination of Parameters Affecting the House Prices by Multiple Regression Analysis and its Contributions to Earthquake-Based Urban Transformation

    Science.gov (United States)

    Denli, H. H.; Durmus, B.

    2016-12-01

    The purpose of this study is to examine the factors which may affect the apartment prices with multiple linear regression analysis models and visualize the results by value maps. The study is focused on a county of Istanbul - Turkey. Totally 390 apartments around the county Umraniye are evaluated due to their physical and locational conditions. The identification of factors affecting the price of apartments in the county with a population of approximately 600k is expected to provide a significant contribution to the apartment market.Physical factors are selected as the age, number of rooms, size, floor numbers of the building and the floor that the apartment is positioned in. Positional factors are selected as the distances to the nearest hospital, school, park and police station. Totally ten physical and locational parameters are examined by regression analysis.After the regression analysis has been performed, value maps are composed from the parameters age, price and price per square meters. The most significant of the composed maps is the price per square meters map. Results show that the location of the apartment has the most influence to the square meter price information of the apartment. A different practice is developed from the composed maps by searching the ability of using price per square meters map in urban transformation practices. By marking the buildings older than 15 years in the price per square meters map, a different and new interpretation has been made to determine the buildings, to which should be given priority during an urban transformation in the county.This county is very close to the North Anatolian Fault zone and is under the threat of earthquakes. By marking the apartments older than 15 years on the price per square meters map, both older and expensive square meters apartments list can be gathered. By the help of this list, the priority could be given to the selected higher valued old apartments to support the economy of the country

  5. Gingival crevicular fluid alkaline phosphatase activity in relation to pubertal growth spurt and dental maturation: A multiple regression study

    Directory of Open Access Journals (Sweden)

    Perinetti, G.

    2016-04-01

    Full Text Available Introduction: The identification of the onset of the pubertal growth spurt has major clinical implications when dealing with orthodontic treatment in growing subjects. Aim: Through multivariate methods, this study evaluated possible relationships between the gingival crevicular fluid (GCF alkaline phosphatase (ALP activity and pubertal growth spurt and dentition phase. Materials and methods: One hundred healthy growing subjects (62 females, 38 males; mean age, 11.5±2.4 years were enrolled into this doubleblind, prospective, cross-sectional-design study. Phases of skeletal maturation (pre - pubertal, pubertal, post - pubertal was assessed using the cervical vertebral maturation method. Samples of GCF for the ALP activity determination were collected at the mesial and distal sites of the mandibular central incisors. The phases of the dentition were recorded as intermediate mixed, late mixed, or permanent. A multinomial multiple logistic regression model was used to assess relationships of the enzymatic activity to growth phases and dentition phases. Results: The GCF ALP activity was greater in the pubertal growth phase as compared to the pre - pubertal and post - pubertal growth phases. Significant adjusted odds ratios for the GCF ALP activity for the pre - pubertal and post - pubertal subjects, in relation to the pubertal group, were 0.76 and 0.84, respectively. No significant correlations were seen for the dentition phase. Conclusions: The GCF ALP activity is a valid candidate as a non - invasive biomarker for the identification of the pubertal growth spurt irrespective of the dentition phase.

  6. Beta Regression Finite Mixture Models of Polarization and Priming

    Science.gov (United States)

    Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay

    2011-01-01

    This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…

  7. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    Science.gov (United States)

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  8. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    Science.gov (United States)

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  9. Total-Factor Energy Efficiency (TFEE Evaluation on Thermal Power Industry with DEA, Malmquist and Multiple Regression Techniques

    Directory of Open Access Journals (Sweden)

    Jin-Peng Liu

    2017-07-01

    Full Text Available Under the background of a new round of power market reform, realizing the goals of energy saving and emission reduction, reducing the coal consumption and ensuring the sustainable development are the key issues for thermal power industry. With the biggest economy and energy consumption scales in the world, China should promote the energy efficiency of thermal power industry to solve these problems. Therefore, from multiple perspectives, the factors influential to the energy efficiency of thermal power industry were identified. Based on the economic, social and environmental factors, a combination model with Data Envelopment Analysis (DEA and Malmquist index was constructed to evaluate the total-factor energy efficiency (TFEE in thermal power industry. With the empirical studies from national and provincial levels, the TFEE index can be factorized into the technical efficiency index (TECH, the technical progress index (TPCH, the pure efficiency index (PECH and the scale efficiency index (SECH. The analysis showed that the TFEE was mainly determined by TECH and PECH. Meanwhile, by panel data regression model, unit coal consumption, talents and government supervision were selected as important indexes to have positive effects on TFEE in thermal power industry. In addition, the negative indexes, such as energy price and installed capacity, were also analyzed to control their undesired effects. Finally, considering the analysis results, measures for improving energy efficiency of thermal power industry were discussed widely, such as strengthening technology research and design (R&D, enforcing pollutant and emission reduction, distributing capital and labor rationally and improving the government supervision. Relative study results and suggestions can provide references for Chinese government and enterprises to enhance the energy efficiency level.

  10. A generalized exponential time series regression model for electricity prices

    DEFF Research Database (Denmark)

    Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso

    on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...

  11. The use of artificial neural network analysis and multiple regression for trap quality evaluation: a case study of the Northern Kuqa Depression of Tarim Basin in western China

    Energy Technology Data Exchange (ETDEWEB)

    Guangren Shi; Xingxi Zhou; Guangya Zhang; Xiaofeng Shi; Honghui Li [Research Institute of Petroleum Exploration and Development, Beijing (China)

    2004-03-01

    Artificial neural network analysis is found to be far superior to multiple regression when applied to the evaluation of trap quality in the Northern Kuqa Depression, a gas-rich depression of Tarim Basin in western China. This is because this technique can correlate the complex and non-linear relationship between trap quality and related geological factors, whereas multiple regression can only describe a linear relationship. However, multiple regression can work as an auxiliary tool, as it is suited to high-speed calculations and can indicate the degree of dependence between the trap quality and its related geological factors which artificial neural network analysis cannot. For illustration, we have investigated 30 traps in the Northern Kuqa Depression. For each of the traps, the values of 14 selected geological factors were all known. While geologists were also able to assign individual trap quality values to 27 traps, they were less certain about the values for the other three traps. Multiple regression and artificial neural network analysis were, therefore, respectively used to ascertain these values. Data for the 27 traps were used as known sample data, while the three traps were used as prediction candidates. Predictions from artificial neural network analysis are found to agree with exploration results: where simulation predicted high trap quality, commercial quality flows were afterwards found, and where low trap quality is indicated, no such discoveries have yet been made. On the other hand, multiple regression results indicate the order of dependence of the trap quality on geological factors, which reconciles with what geologists have commonly recognized. We can conclude, therefore, that the application of artificial neural network analysis with the aid of multiple regression to trap evaluation in the Northern Kuqa Depression has been quite successful. To ensure the precision of the above mentioned geological factors and their related parameters for each

  12. Forecast Model of Urban Stagnant Water Based on Logistic Regression

    Directory of Open Access Journals (Sweden)

    Liu Pan

    2017-01-01

    Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.

  13. Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.

    Science.gov (United States)

    Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina

    2017-07-01

    This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.

  14. Genomic prediction based on data from three layer lines using non-linear regression models.

    Science.gov (United States)

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional

  15. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  16. Decreasing Multicollinearity: A Method for Models with Multiplicative Functions.

    Science.gov (United States)

    Smith, Kent W.; Sasaki, M. S.

    1979-01-01

    A method is proposed for overcoming the problem of multicollinearity in multiple regression equations where multiplicative independent terms are entered. The method is not a ridge regression solution. (JKS)

  17. Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

    Science.gov (United States)

    Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

    2018-01-01

    Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.

  18. Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization

    Directory of Open Access Journals (Sweden)

    MUHIB ALI RAJPER

    2016-07-01

    Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.

  19. Statistical experiments using the multiple regression research for prediction of proper hardness in areas of phosphorus cast-iron brake shoes manufacturing

    Science.gov (United States)

    Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.

    2018-01-01

    Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.

  20. Misspecified poisson regression models for large-scale registry data

    DEFF Research Database (Denmark)

    Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.

    2016-01-01

    working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...

  1. Logistic regression model for detecting radon prone areas in Ireland.

    Science.gov (United States)

    Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S

    2017-12-01

    A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Augmented chaos-multiple linear regression approach for prediction of wave parameters

    Directory of Open Access Journals (Sweden)

    M.A. Ghorbani

    2017-06-01

    The inter-comparisons demonstrated that the Chaos-MLR and pure MLR models yield almost the same accuracy in predicting the significant wave heights and the zero-up-crossing wave periods. Whereas, the augmented Chaos-MLR model is performed better results in term of the prediction accuracy vis-a-vis the previous prediction applications of the same case study.

  4. Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits

    National Research Council Canada - National Science Library

    Gravier, Michael

    1999-01-01

    .... This thesis draws on available data from the electronics integrated circuit industry to attempt to assess whether statistical modeling offers a viable method for predicting the presence of DMSMS...

  5. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    Science.gov (United States)

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi

  6. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain using analogues and analogues followed by random forests and multiple linear regression

    Directory of Open Access Journals (Sweden)

    G. Ibarra-Berastegi

    2011-06-01

    Full Text Available In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain during the 1961–2001 period. Three types of downscaling models have been built: (i analogues, (ii analogues followed by random forests and (iii analogues followed by multiple linear regression. The inputs consist of data (predictor fields taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961–1996 and a test period (1997–2001, where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.

  7. Evaluation of the comprehensive palatability of Japanese sake paired with dishes by multiple regression analysis based on subdomains.

    Science.gov (United States)

    Nakamura, Ryo; Nakano, Kumiko; Tamura, Hiroyasu; Mizunuma, Masaki; Fushiki, Tohru; Hirata, Dai

    2017-08-01

    Many factors contribute to palatability. In order to evaluate the palatability of Japanese alcohol sake paired with certain dishes by integrating multiple factors, here we applied an evaluation method previously reported for palatability of cheese by multiple regression analysis based on 3 subdomain factors (rewarding, cultural, and informational). We asked 94 Japanese participants/subjects to evaluate the palatability of sake (1st evaluation/E1 for the first cup, 2nd/E2 and 3rd/E3 for the palatability with aftertaste/afterglow of certain dishes) and to respond to a questionnaire related to 3 subdomains. In E1, 3 factors were extracted by a factor analysis, and the subsequent multiple regression analyses indicated that the palatability of sake was interpreted by mainly the rewarding. Further, the results of attribution-dissections in E1 indicated that 2 factors (rewarding and informational) contributed to the palatability. Finally, our results indicated that the palatability of sake was influenced by the dish eaten just before drinking.

  8. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Science.gov (United States)

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-10-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. © The Author 2014. Published by Oxford University Press.

  9. Association between response rates and survival outcomes in patients with newly diagnosed multiple myeloma. A systematic review and meta-regression analysis.

    Science.gov (United States)

    Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos

    2017-06-01

    We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  10. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  11. Martingale Regressions for a Continuous Time Model of Exchange Rates

    OpenAIRE

    Guo, Zi-Yi

    2017-01-01

    One of the daunting problems in international finance is the weak explanatory power of existing theories of the nominal exchange rates, the so-called “foreign exchange rate determination puzzle”. We propose a continuous-time model to study the impact of order flow on foreign exchange rates. The model is estimated by a newly developed econometric tool based on a time-change sampling from calendar to volatility time. The estimation results indicate that the effect of order flow on exchange rate...

  12. Focused information criterion and model averaging based on weighted composite quantile regression

    KAUST Repository

    Xu, Ganggang; Wang, Suojin; Huang, Jianhua Z.

    2013-01-01

    We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non

  13. pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

    Science.gov (United States)

    Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

    2016-07-05

    Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  14. Estimation of nutrients and organic matter in Korean swine slurry using multiple regression analysis of physical and chemical properties.

    Science.gov (United States)

    Suresh, Arumuganainar; Choi, Hong Lim

    2011-10-01

    Swine waste land application has increased due to organic fertilization, but excess application in an arable system can cause environmental risk. Therefore, in situ characterizations of such resources are important prior to application. To explore this, 41 swine slurry samples were collected from Korea, and wide differences were observed in the physico-biochemical properties. However, significant (Phydrometer, EC meter, drying oven and pH meter were found useful to estimate Mn, Fe, Ca, K, Al, Na, N and 5-day biochemical oxygen demands (BOD₅) at improved R² values of 0.83, 0.82, 0.77, 0.75, 0.67, 0.47, 0.88 and 0.70, respectively. The results from this study suggest that multiple property regressions can facilitate the prediction of micronutrients and organic matter much better than a single property regression for livestock waste. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Cox's regression model for dynamics of grouped unemployment data

    Czech Academy of Sciences Publication Activity Database

    Volf, Petr

    2003-01-01

    Roč. 10, č. 19 (2003), s. 151-162 ISSN 1212-074X R&D Projects: GA ČR GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : mathematical statistics * survival analysis * Cox's model Subject RIV: BB - Applied Statistics, Operational Research

  16. Inflation, Forecast Intervals and Long Memory Regression Models

    NARCIS (Netherlands)

    C.S. Bos (Charles); Ph.H.B.F. Franses (Philip Hans); M. Ooms (Marius)

    2001-01-01

    textabstractWe examine recursive out-of-sample forecasting of monthly postwar U.S. core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading

  17. Inflation, Forecast Intervals and Long Memory Regression Models

    NARCIS (Netherlands)

    Ooms, M.; Bos, C.S.; Franses, P.H.

    2003-01-01

    We examine recursive out-of-sample forecasting of monthly postwar US core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading indicators

  18. Data-driven modelling of LTI systems using symbolic regression

    NARCIS (Netherlands)

    Khandelwal, D.; Toth, R.; Van den Hof, P.M.J.

    2017-01-01

    The aim of this project is to automate the task of data-driven identification of dynamical systems. The underlying goal is to develop an identification tool that models a physical system without distinguishing between classes of systems such as linear, nonlinear or possibly even hybrid systems. Such

  19. A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data

    International Nuclear Information System (INIS)

    Lefevre, Nathalie; Watson, Andrew J.; Watson, Adam R.

    2005-01-01

    Using about 138,000 measurements of surface pCO 2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO 2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO 2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO 2 is higher than estimates based on available syntheses of surface pCO 2 . This supports earlier suggestions that the sink of CO 2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO 2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 μatm. The subpolar gyre is a net sink of CO 2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof

  20. Multiple Indicator Stationary Time Series Models.

    Science.gov (United States)

    Sivo, Stephen A.

    2001-01-01

    Discusses the propriety and practical advantages of specifying multivariate time series models in the context of structural equation modeling for time series and longitudinal panel data. For time series data, the multiple indicator model specification improves on classical time series analysis. For panel data, the multiple indicator model…