Sample records for multiple regression analysis

  1. Multiple linear regression analysis

    Edwards, T. R.


    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  2. Remaining Phosphorus Estimate Through Multiple Regression Analysis



    The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.


    Erika KULCSÁR


    This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on...

  4. Multiple regression analysis of cancer incidence around nuclear plant

    The results of a multiple regression analysis of cancer incidence in the vicinity of a nuclear plant are presented. No dependence on radiation factors (natural background, radioactive releases, total dose of all types of medical examinations) is established. At the same time a relationship between general cancer incidence, turmors of lungs, trashea, bronchi and hematopoictic tissue carcimona incidence and releases of dangerous chemical substances is revealed


    Erika KULCSÁR


    Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.

  6. The analysis of the correlation between GDP, private and public consumption through multiple regression

    Constantin ANGHELACHE; Alexandru MANOLE; Madalina Gabriela ANGHEL


    The analysis of the correlation between indicators, through multiple regression, completes the information and conclusions drawn through the application of some simple regression models. Supplementary elements achieved by using multiple regression form an additional informational support for decision makers and analysts. This paper describes a correlation between the GDP, private and public consumption, through a multiple regression model. The model explains the influence of the two types of ...

  7. Multiple linear regression.

    Eberly, Lynn E


    This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050

  8. Business applications of multiple regression

    Richardson, Ronny


    This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta

  9. Multiple regression analysis of the net income and consumption expenditure of Chinese rural households during 2007

    Da, Wa; Xiao, Hong; Zhuo, Ma


    We use the regression analysis method of multivariate statistical analysis to establish a multiple linear regression model about the net income and consumption expenditure of Chinese rural households during the year 2007. This paper analyzes the internal relation between the net income and consumption expenditure of Chinese rural households according to the regression result. Some reasonable suggestions are put forward for raising the income of rural households and stimulating consumption.

  10. An improved multiple linear regression and data analysis computer program package

    Sidik, S. M.


    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  11. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis



    Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestrie...

  12. Quantitative electron microscope autoradiography: application of multiple linear regression analysis

    A new method for the analysis of high resolution EM autoradiographs is described. It identifies labelled cell organelle profiles in sections on a strictly statistical basis and provides accurate estimates for their radioactivity without the need to make any assumptions about their size, shape and spatial arrangement. (author)

  13. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    This paper describes the net peak counts calculating of nuclide 137Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  14. Noninvasive spectral imaging of skin chromophores based on multiple regression analysis aided by Monte Carlo simulation

    Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa


    In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.

  15. A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

    A technique for accurate background subtraction in 99Tcm-DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)


    Godlevsky, L.; Lobasyuk, B.; Kobolev, E.; Luijtelaar, E.L.J.M. van; COENEN A.R.M.L.; Stepanenko, K.; Haghoel, Raz; Prybalovetz, T.


    Relationships between amplitude of penicillin-induced generalized epileptiform signals in different zones of the brain cortex (occipito-frontal bilateral leads, as well as occipital and frontal bipolar leads) were investigated in Wistar rats using multiple linear regression method of analysis. Results were expressed in the form of policycle multigrafs (multidimensional presentation) with the identification of significant (p

  17. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.


    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  18. Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources



    We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which...

  19. Multiple regression approach to mapping of quantitative trait loci (QTL) based on sib-pair data: a theoretical analysis

    Xiong, Momiao; Guo, Sunwei


    The interval mapping method has been shown to be a powerful tool for mapping QTL. However, it is still a challenge to perform a simultaneous analysis of several linked QTLs, and to isolate multiple linked QTLs. To circumvent these problems, multiple regression analysis has been suggested for experimental species. In this paper, the multiple regression approach is extended to human sib-pair data through multiple regression of the squared difference in trait values between two...

  20. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis


    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three uniqu...

  1. Fungible Weights in Multiple Regression

    Waller, Niels G.


    Every set of alternate weights (i.e., nonleast squares weights) in a multiple regression analysis with three or more predictors is associated with an infinite class of weights. All members of a given class can be deemed "fungible" because they yield identical "SSE" (sum of squared errors) and R[superscript 2] values. Equations for generating…

  2. Multiple Regression Analysis of Aroma Components and Sensory Evaluation of Miso

    Sugawara, Etsuko; SAIGA, Suguru; Kobayashi, Akio


    Among several sensory characteristics to evaluate the quality of miso (fermented bean paste), aroma is the most difficult one. If results of chemical analysis of miso aroma could be transformed into numerical terms, the evaluation of miso may become easier. Therefore we investigated relationship between aroma components and sensory scores of rice-miso by multiple regression analysis. Thirty-four rice-miso exhibited at the National Miso Competition were used as the samples. Each peak area of t...

  3. Multiple regression analysis of Jominy hardenability data for boron treated steels

    The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

  4. The Use of Rank Transformation and Multiple Regression Analysis in Estimating Residential Property Values With A Small Sample

    Timothy P. Cronan; Donald R. Epley; Larry G. Perry


    Conventional multiple regression analysis which has been used in estimating residential property values typically relies upon cardinal data. This paper argues that appraisal theory requires the appraiser to rank the comparables from best to worst and use a regression technique which can be applied to ordinal data. The rank regression procedure illustrated here was successfully used on small sample sizes, and did not violate the critical assumptions underlying conventional multiple regression....


    Nop Sopipan


    Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.

  6. Inferring Preferences in Multiple Criteria Decision Analysis Using a Logistic Regression Model

    Theodor J Stewart


    A method is proposed for the analysis of multiple criteria decision making problems in an interactive environment, when decision-maker preferences are inconsistent with a simple utility model and/or are self-inconsistent (e.g., showing intransitivities). A maximum likelihood estimation procedure is invoked which is based on a logistic regression model relating the probability of selecting one decision option over another to a linear function of attribute values. The method is illustrated by a...

  7. Respiratory infections and their influence on lung function in children: a multiple regression analysis.

    Yarnell, J W; St Leger, A S


    The relationship between a history of respiratory infections (and associated variables) in children and lung function in later life was examined in a study among 228 children aged 7 to 11 years. In a multiple regression analysis only a few variables showed marked and consistent effects on lung function. Respiratory tract infections showed increasing impairment of lung function with repeated infections, but the impairment was smaller than that caused by current asthma.

  8. On Testing the Significance of the Coefficients in the Multiple Regression Analysis

    Kończak, Grzegorz


    The multiple regression analysis is a statistical tool for the investigation relationships between the dependent and independent variables. There are some procedures for selecting a subset of given predictors. These procedures are widely available in statistical computer packages. The most often used are forward selection, backward selection and stepwise selection. In these procedures testing the significance of parameters is used. If some assumptions such as normality errors a...


    Abdelrafe Elzamly; Burairah Hussin


    Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation proc...


    K. Seetharaman


    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  11. Assessing Credit Default using Logistic Regression and Multiple Discriminant Analysis: Empirical Evidence from Bosnia and Herzegovina

    Deni Memić


    Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.

  12. QSPR study of molar diamagnetic susceptibility of diverse organic compounds using multiple linear regression analysis

    *S. Saaidpour; S. A. Zarei; F. Nasri


    The multiple linear regression (MLR) was used to build the linear quantitative structure-property relationship (QSPR) model for the prediction of the molar diamagnetic susceptibility (χm) for 140 diverse organic compounds using the three significant descriptors calculated from the molecular structures alone and selected by stepwise regression method. Stepwise regression was employed to develop a regression equation based on 100 training compounds, and predictive ability was tested on 40 compo...

  13. Estimate of Compressive Strength for Concrete using Ultrasonics by Multiple Regression Analysis Method

    Various types of ultrasonic techniques have been used for the estimation of compressive strength of concrete structures. However, conventional ultrasonic velocity method using only longitudial wave cannot be determined the compressive strength of concrete structures with accuracy. In this paper, by using the introduction of multiple parameter, e. g. velocity of shear wave, velocity of longitudinal wave, attenuation coefficient of shear wave, attenuation coefficient of longitudinal wave, combination condition, age and preservation method, multiple regression analysis method was applied to the determination of compressive strength of concrete structures. The experimental results show that velocity of shear wave can be estimated compressive strength of concrete with more accuracy compared with the velocity of longitudinal wave, accuracy of estimated error range of compressive strength of concrete structures can be enhanced within the range of ± 10% approximately

  14. Regression Analysis

    Freund, Rudolf J; Sa, Ping


    The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design

  15. Aspects Regarding the Multiple Regression Used in Macro-economic Analysis

    Constantin ANGHELACHE; Alexandru MANOLE; Ligia PRODAN; Andreea Gabriela BALTAC; Zoica DINCA (NICOLA)


    The regression function serves as a basis for carrying out the numerous analyzes micro or macroeconomic indicators. Information obtained by use of the model simple linear regression are not always sufficient to characterize changes in an economic phenomenon and, in particular, to identify possible future evolution of the latter. To remedy these shortcomings, in the literature had been entered into multiple regression models in which the dependent variable is defined on the basis of two or mor...


    Constantin Anghelache; Ioan Partachi


    The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.

  17. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.


    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  18. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    Taneja, Abhishek


    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...

  19. On-line contextual influences during reading normal text: a multiple-regression analysis.

    Pynte, Joel; New, Boris; Kennedy, Alan


    On-line contextual influences during reading were examined in a series of multiple-regression analyses conducted on a large-scale corpus of eye-movement data, using Latent Semantic Analysis (LSA) to assess the degree of contextual constraints exerted on a given target word by the immediately prior word and by the prior sentence fragment. A decrease in inspection time was observed as contextual constraints increased. Word-level constraints exerted their influence both forward (on both single-fixation and gaze durations) and backward (on gaze duration only). An independent sentence-level effect was only visible in the forward direction, and only for gaze duration. Gaze duration was also sensitive to the depth of embedding of the target word in the syntactic structure. We conclude that both low-level and high-level contextual constraints can translate in the eye-movement record. PMID:18701125

  20. Empirical predictive models of daily relativistic electron flux at geostationary orbit: Multiple regression analysis

    Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; Reeves, Geoffrey D.; Clilverd, Mark


    The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the prediction of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). A path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current (Dst), AE, and wave activity.

  1. Thermodynamic analysis of simple gas turbine cycle with multiple regression modelling and optimization

    In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature), PR (Pressure Ratio) and TIT (Turbine Inlet Temperature) on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic) with the predictor variables (operating parameters). The regression model equations showed a significant statistical relationship between the predictor and response variables. (author)

  2. Thermodynamic Analysis of Simple Gas Turbine Cycle with Multiple Regression Modelling and Optimization

    Abdul Ghafoor Memon


    Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.


    Abdelrafe Elzamly


    Full Text Available Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation procedures such as MMRE and Pred (25 is used to compare the accuracy of techniques. The model’s accuracy slightly improves in stepwise multiple regression rather than fuzzy multiple regression. This study will guide software managers to apply software risk management practices with real world software development organizations and verify the effectiveness of the new techniques and approaches on a software project. The study has been conducted on a group of software project using survey questionnaire. It is hope that this will enable software managers improve their decision to increase the probability of software project success.

  4. PUMA: a unified framework for penalized multiple regression analysis of GWAS data.

    Gabriel E Hoffman

    Full Text Available Penalized Multiple Regression (PMR can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM algorithm for generalized linear models (GLM combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP, as well as a penalty that has not been previously applied to GWAS (i.e. LOG. Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn

  5. Multiple Regressive Model Adaptive Control

    Garipov, Emil; Stoilkov, Teodor; Kalaykov, Ivan


    The essence of the ideas applied to this text consists in the development of the strategy for control of the arbitrary in complexity continuous plant by means of a set of discrete timeinvariant linear controllers. Their number and tuned parameters correspond to the number and parameters of the linear time-invariant regressive models in the model bank, which approximate the complex plant dynamics in different operating points. Described strategy is known as Multiple Regressive Model Adaptive C...

  6. Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

    Williams, Ryan


    The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…

  7. Influence of plant root morphology and tissue composition on phenanthrene uptake: Stepwise multiple linear regression analysis

    Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake

  8. Anomalous particle pinch and scaling of vin/D based on transport analysis and multiple regression

    Becker, G.; Kardaun, O.


    Predictions of density profiles in current tokamaks and ITER require a validated scaling relation for vin/D where vin is the anomalous inward drift velocity and D is the anomalous diffusion coefficient. Transport analysis is necessary for determining the anomalous particle pinch from measured density profiles and for separating the impact of particle sources. A set of discharges in ASDEX Upgrade, DIII-D, JET and ASDEX is analysed using a special version of the 1.5-D BALDUR transport code. Profiles of ρsvin/D with ρs the effective separatrix radius, five other dimensionless parameters and many further quantities in the confinement zone are compiled, resulting in the dataset VIND1.dat, which covers a wide parameter range. Weighted multiple regression is applied to the ASDEX Upgrade subset which leads to a two-term scaling \\rho _sv_in ({x'}) /D ({x'}) =0.0432 [ { ({L_{T_{\\rme}} ({ \\bar {x}'}) / \\rho _s}) ^{-2.58}+7.13 \\, U_L^{1.55} \

  9. Multiple correlation and regression analysis of relation between amplitude and spectral characteristics of proton events and microwave burst parameters

    The results of studying the interconnection of parameters of solar cosmic ray (SCR) events and microwave (μ) bursts obtained by the methods of multiple statistic analysis, are presented. It is shown using multiple correlation and regression analysis that the main peculiarities of the connection between μ-bursts and SCR events can be understood when accounting the differences in the dynamics of electrons and protons in different size flare arcs, supposing no SCR particle acceleration in the second flare phase

  10. Investigations upon the indefinite rolls quality assurance in multiple regression analysis

    The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks) and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers collectives of the

  11. Linear regression analysis

    Kılıç, Selim


    Linear regression is an approach to modeling the association between a numeric dependent variable y and one or more independent variables denoted X. The case of one explanatory variable in regression model is called simple linear regression. For more than one explanatory variable, then the model is called multiple linear regression. The dependent variable should be a numeric variable in linear regression. It is recommended at least 10 times as many cases as the number of independent variables...

  12. Physical and Cognitive-Affective Factors Associated with Fatigue in Individuals with Fibromyalgia: A Multiple Regression Analysis

    Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong


    Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…

  13. Multiple linear regression analysis of bacterial deposition to polyurethane coatings after conditioning film formation in the marine environment

    Bakker, D.P.; Busscher, H.J.; Zanten, J. van; Vries, J. de; Klijnstra, J.W.; Mei, H.C. van der


    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh

  14. Multiple linear regression analysis of bacterial deposition to polyurethane coating after conditioning film formation in the marine environment

    Bakker, Dewi P; Busscher, Henk J; van Zanten, Joyce; de Vries, Jacob; Klijnstra, Job W; van der Mei, Henny C


    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh

  15. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  16. Regression analysis by example

    Chatterjee, Samprit


    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  17. Understanding logistic regression analysis

    Sperandei, Sandro


    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  18. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Ani Shabri; Ruhaidah Samsudin


    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing...

  19. A calibration method of Argo floats based on multiple regression analysis


    Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.

  20. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat


    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  1. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway


    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopt...

  2. Quantifying components of the hydrologic cycle in Virginia using chemical hydrograph separation and multiple regression analysis

    Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.


    This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.

  3. Data of multiple regressions analysis between selected biomarkers related to glutamate excitotoxicity and oxidative stress in Saudi autistic patients.

    El-Ansary, Afaf


    This work demonstrates data of multiple regression analysis between nine biomarkers related to glutamate excitotoxicity and impaired detoxification as two mechanisms recently recorded as autism phenotypes. The presented data was obtained by measuring a panel of markers in 20 autistic patients aged 3-15 years and 20 age and gender matching healthy controls. Levels of GSH, glutathione status (GSH/GSSG), glutathione reductase (GR), glutathione-s-transferase (GST), thioredoxin (Trx), thioredoxin reductase (TrxR) and peroxidoxins (Prxs I and III), glutamate, glutamine, glutamate/glutamine ratio glutamate dehydrogenase (GDH) in plasma and mercury (Hg) in red blood cells were determined in both groups. In Multiple regression analysis, R (2) values which describe the proportion or percentage of variance in the dependent variable attributed to the variance in the independent variables together were calculated. Moreover, β coefficients values which show the direction either positive or negative and the contribution of the independent variable relative to the other independent variables in explaining the variation of the dependent variable were determined. A panel of inter-related markers was recorded. This paper contains data related to and supporting research articles currently published entitled "Mechanism of nitrogen metabolism-related parameters and enzyme activities in the pathophysiology of autism" [1], "Novel metabolic biomarkers related to sulfur-dependent detoxification pathways in autistic patients of Saudi Arabia [2], and "A key role for an impaired detoxification mechanism in the etiology and severity of autism spectrum disorders" [3]. PMID:26933667

  4. Violence against Chinese female sex workers from their stable partners: a hierarchical multiple regression analysis.

    Zhang, Chen; Li, Xiaoming; Su, Shaobing; Hong, Yan; Zhou, Yuejiao; Tang, Zhenzhu; Shen, Zhiyong


    Limited data are available regarding risk factors that are related to intimate partner violence (IPV) against female sex workers (FSWs) in the context of stable partnerships. Out of the 1,022 FSWs, 743 reported ever having a stable partnership and 430 (more than half) of those reported experiencing IPV. Hierarchical multivariate regression revealed that some characteristics of stable partners (e.g., low education, alcohol use) and relationship stressors (e.g., frequent friction, concurrent partnerships) were independently predictive of IPV against FSWs. Public health professionals who design future violence prevention interventions targeting FSWs need to consider the influence of their stable partners. PMID:24730642

  5. Correlation Weights in Multiple Regression

    Waller, Niels G.; Jones, Jeff A.


    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  6. Interpretation of Regressions with Multiple Proxies

    Darren Lubotsky; Martin Wittenberg


    We consider the situation in which there are multiple proxies for one unobserved explanatory variable in a linear regression and provide a procedure by which the coefficient of interest can be extracted "post hoc" from a multiple regression in which all the proxies are used simultaneously. This post hoc estimator is strictly superior in large samples to coefficients derived using any index or linear combination of the proxies that is created prior to the regression. To use an index created fr...

  7. Bayesian logistic regression analysis

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.


    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  8. Comparison of a neural network with multiple linear regression for quantitative analysis in ICP-atomic emission spectroscopy

    A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S+ of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)

  9. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Barrett, C. A.


    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  10. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis.

    Ingunn Fride Tvete

    Full Text Available Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores. The ranking of the drugs when given without DMARD was certolizumab (ranked highest, etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest, tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment and adalimumab/ etanercept (combined with DMARD treatment the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.

  11. Multiple Linear Regression Models in Outlier Detection

    S.M.A.Khaleelur Rahman


    Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.

  12. Multiple Imputations for LInear Regression Models

    Brownstone, David


    Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubin’s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency  multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welsh’s theorems to give conditions where multiple imputations yield consistent inferences for bo...

  13. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Ani Shabri


    Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  14. Relationships between each part of the spinal curves and upright posture using Multiple stepwise linear regression analysis.

    Boulet, Sebastien; Boudot, Elsa; Houel, Nicolas


    Back pain is a common reason for consultation in primary healthcare clinical practice, and has effects on daily activities and posture. Relationships between the whole spine and upright posture, however, remain unknown. The aim of this study was to identify the relationship between each spinal curve and centre of pressure position as well as velocity for healthy subjects. Twenty-one male subjects performed quiet stance in natural position. Each upright posture was then recorded using an optoelectronics system (Vicon Nexus) synchronized with two force plates. At each moment, polynomial interpolations of markers attached on the spine segment were used to compute cervical lordosis, thoracic kyphosis and lumbar lordosis angle curves. Mean of centre of pressure position and velocity was then computed. Multiple stepwise linear regression analysis showed that the position and velocity of centre of pressure associated with each part of the spinal curves were defined as best predictors of the lumbar lordosis angle (R(2)=0.45; p=1.65*10-10) and the thoracic kyphosis angle (R(2)=0.54; p=4.89*10-13) of healthy subjects in quiet stance. This study showed the relationships between each of cervical, thoracic, lumbar curvatures, and centre of pressure's fluctuation during free quiet standing using non-invasive full spinal curve exploration. PMID:26970888

  15. Computing multiple-output regression quantile regions

    Paindaveine, D.; Šiman, Miroslav


    Roč. 56, č. 4 (2012), s. 840-853. ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012

  16. The Geometry of Enhancement in Multiple Regression

    Waller, Niels G.


    In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and enhancement cannot…

  17. Health Expenditures in Greece: A Multiple Least Squares Regression and Cointegration Analysis Using Bootstrap Simulation in EVIEWS

    Giovanis, Eleftherios


    This paper examines the factors that are contributing at the most explained and efficient way to health expenditures in Greece. Two methods are applied. Multiple regressions and vector error correction models are estimated, as also unit root tests applied to define in which order variables are stationary. Because the available data are yearly and capture a small period from 1985-2006, so the sample is small, a bootstrap simulation is applied, to improve the estimations.

  18. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis

    Greensmith, David J.


    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of...

  19. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

    Denli, H. H.; Koc, Z.


    Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

  20. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    Kokaly, R.F.; Clark, R.N.


    We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using

  1. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis

    Yoo, Yun Joo; Sun, Lei; Shelley B Bull


    Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SN...

  2. Prediction of the processing factor for pesticides in apple juice by principal component analysis and multiple linear regression.

    Martin, L; Mezcua, M; Ferrer, C; Gil Garcia, M D; Malato, O; Fernandez-Alba, A R


    The main objective of this work was to establish a mathematical function that correlates pesticide residue levels in apple juice with the levels of the pesticides applied on the raw fruit, taking into account some of their physicochemical properties such as water solubility, the octanol/water partition coefficient, the organic carbon partition coefficient, vapour pressure and density. A mixture of 12 pesticides was applied to an apple tree; apples were collected after 10 days of application. After harvest, apples were treated with a mixture of three post-harvest pesticides and the fruits were then processed in order to obtain apple juice following a routine industrial process. The pesticide residue levels in the apple samples were analysed using two multi-residue methods based on LC-MS/MS and GC-MS/MS. The concentration of pesticides was determined in samples derived from the different steps of processing. The processing factors (the coefficient between residue level in the processed commodity and the residue level in the commodity to be processed) obtained for the full juicing process were found to vary among the different pesticides studied. In order to investigate the relationships between the levels of pesticide residue found in apple juice samples and their physicochemical properties, principal component analysis (PCA) was performed using two sets of samples (one of them using experimental data obtained in this work and the other including the data taken from the literature). In both cases the correlation was found between processing factors of pesticides in the apple juice and the negative logarithms (base 10) of the water solubility, octanol/water partition coefficient and organic carbon partition coefficient. The linear correlation between these physicochemical properties and the processing factor were established using a multiple linear regression technique. PMID:23281800

  3. Regression analysis in quantum language

    ISHIKAWA, Shiro


    Although regression analysis has a great history, we consider that it has always continued being confused. For example, the fundamental terms in regression analysis (e.g., "regression", "least-squares method", "explanatory variable", "response variable", etc.) seem to be historically conventional, that is, these words do not express the essence of regression analysis. Recently, we proposed quantum language (or, classical and quantum measurement theory), which is characterized as the linguisti...

  4. On directional multiple-output quantile regression

    Paindaveine, D.; Šiman, Miroslav


    Roč. 102, č. 2 (2011), s. 193-212. ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant ostatní: Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011

  5. Regression Analysis A Constructive Critique

    Berk, Richard A


    Regression Analysis: A Constructive Critique identifies a wide variety of problems with regression analysis as it is commonly used and then provides a number of ways in which practice could be improved. Regression is most useful for data reduction, leading to relatively simple but rich and precise descriptions of patterns in a data set. The emphasis on description provides readers with an insightful rethinking from the ground up of what regression analysis can do, so that readers can better match regression analysis with useful empirical questions and improved policy-related research. "An

  6. Hierarchical regression for analyses of multiple outcomes.

    Richardson, David B; Hamra, Ghassan B; MacLehose, Richard F; Cole, Stephen R; Chu, Haitao


    In cohort mortality studies, there often is interest in associations between an exposure of primary interest and mortality due to a range of different causes. A standard approach to such analyses involves fitting a separate regression model for each type of outcome. However, the statistical precision of some estimated associations may be poor because of sparse data. In this paper, we describe a hierarchical regression model for estimation of parameters describing outcome-specific relative rate functions and associated credible intervals. The proposed model uses background stratification to provide flexible control for the outcome-specific associations of potential confounders, and it employs a hierarchical "shrinkage" approach to stabilize estimates of an exposure's associations with mortality due to different causes of death. The approach is illustrated in analyses of cancer mortality in 2 cohorts: a cohort of dioxin-exposed US chemical workers and a cohort of radiation-exposed Japanese atomic bomb survivors. Compared with standard regression estimates of associations, hierarchical regression yielded estimates with improved precision that tended to have less extreme values. The hierarchical regression approach also allowed the fitting of models with effect-measure modification. The proposed hierarchical approach can yield estimates of association that are more precise than conventional estimates when one wishes to estimate associations with multiple outcomes. PMID:26232395

  7. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    Lee, L.; Helsel, D.


    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  8. Multiple Regression Analyses in Clinical Child and Adolescent Psychology

    Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida


    A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation,…

  9. A Study of Personality and Family- and School Environment and Possible Interactional Effects in 244 Swedish Children—A Multiple Regression Analysis

    Persson, Bertil


    The aim of the study was to examine relationships between psychosocial family- and school environment and personality as assessed by the Junior Eysenck Personality Questionnaire (EPQ-J) and possible personality interactional effects. The study was based on 244 Swedish girls and boys, 10-19 years old, who filled in the Family- and School Psychosocial Environment (FSPE) questionnaire and the EPQ-J. A multiple regression analysis showed that the FSPE-factor Family conflicts and school discipline...

  10. Multiple linear regression for isotopic measurements

    Garcia Alonso, J. I.


    There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.

  11. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice; Romary, Laurent


    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression m...

  12. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui


    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  13. A comparative study of multiple regression analysis and back propagation neural network approaches on plain carbon steel in submerged-arc welding



    Weld bead plays an important role in determining the quality of welding particularly in high heat input processes. This research paper presents the development of multiple regression analysis (MRA) and artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arcwelding process. Design of experiments is based on Taguchi’s L16 orthogonal array by varying wire feed rate,transverse speed and stick out to develop a multiple regression model, which has been checked for adequacy andsignificance. Also, ANN model was accomplished with the back propagation approach in MATLAB program to predict bead geometry and HAZ width. Finally, the results of two prediction models were compared and analyzed. It is found that the error related to the prediction of bead geometry and HAZ width is smaller in ANN than MRA.

  14. Improving Regional Dynamic Downscaling with Multiple Linear Regression Model Using Components Principal Analysis: Precipitation over Amazon and Northeast Brazil

    Aline Gomes da Silva


    Full Text Available In the current context of climate change discussions, predictions of future scenarios of weather and climate are crucial for the generation of information of interest to the global community. Due to the atmosphere being a chaotic system, errors in predictions of future scenarios are systematically observed. Therefore, numerous techniques have been tested in order to generate more reliable predictions, and two techniques have excelled in science: dynamic downscaling, through regional models, and ensemble prediction, combining different outputs of climate models through the arithmetic average, in other words, a postprocessing of the output data species. Thus, this paper proposes a method of postprocessing outputs of regional climate models. This method consists in using the statistical tool multiple linear regression by principal components for combining different simulations obtained by dynamic downscaling with the regional climate model (RegCM4. Tests for the Amazon and Northeast region of Brazil (South America showed that the method provided a more realistic prediction in terms of average daily rainfall for the analyzed period prescribed, after comparing with the prediction made by set through the arithmetic averages of the simulations. This method photographed the extreme events (outlier that the prediction by averaging failed. Data from the Tropical Rainfall Measuring Mission (TRMM were used to evaluate the method.

  15. A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

    Hukharnsusatrue, A.


    Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than

  16. Interpretation of Standardized Regression Coefficients in Multiple Regression.

    Thayer, Jerome D.

    The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…

  17. Regression Computer Programs for Setwise Regression and Three Related Analysis of Variance Techniques.

    Williams, John D.; Lindem, Alfred C.

    Four computer programs using the general purpose multiple linear regression program have been developed. Setwise regression analysis is a stepwise procedure for sets of variables; there will be as many steps as there are sets. Covarmlt allows a solution to the analysis of covariance design with multiple covariates. A third program has three…

  18. Estimating the input function non-invasively for FDG-PET quantification with multiple linear regression analysis: simulation and verification with in vivo data

    A novel statistical method, namely Regression-Estimated Input Function (REIF), is proposed in this study for the purpose of non-invasive estimation of the input function for fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) quantitative analysis. We collected 44 patients who had undergone a blood sampling procedure during their FDG-PET scans. First, we generated tissue time-activity curves of the grey matter and the whole brain with a segmentation technique for every subject. Summations of different intervals of these two curves were used as a feature vector, which also included the net injection dose. Multiple linear regression analysis was then applied to find the correlation between the input function and the feature vector. After a simulation study with in vivo data, the data of 29 patients were applied to calculate the regression coefficients, which were then used to estimate the input functions of the other 15 subjects. Comparing the estimated input functions with the corresponding real input functions, the averaged error percentages of the area under the curve and the cerebral metabolic rate of glucose (CMRGlc) were 12.13±8.85 and 16.60±9.61, respectively. Regression analysis of the CMRGlc values derived from the real and estimated input functions revealed a high correlation (r=0.91). No significant difference was found between the real CMRGlc and that derived from our regression-estimated input function (Student's t test, P>0.05). The proposed REIF method demonstrated good abilities for input function and CMRGlc estimation, and represents a reliable replacement for the blood sampling procedures in FDG-PET quantification. (orig.)

  19. Regression, Discriminant Analysis, and Canonical Correlation Analysis with Homals

    Jan de Leeuw


    It is shown that the homals package in R can be used for multiple regression, multi-group discriminant analysis, and canonical correlation analysis. The homals solutions are only different from the more conventional ones in the way the dimensions are scaled by the eigenvalues.It is shown that the homals package in R can be used for multiple regression, multi-group discriminant analysis, and canonical correlation analysis. The homals solutions are only different from the more conventional ones...

  20. Entrepreneurial intention modeling using hierarchical multiple regression

    Marina Jeger


    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  1. Heteroscedastic regression analysis method for mixed data

    FU Hui-min; YUE Xiao-rui


    The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.

  2. Pricing Single Malt Whisky : A Regression Analysis

    Bjartmar Hylta, Sanna; Lundquist, Emma


    This thesis examines the factors that affect the price of whisky. Multiple regression analysis is used to model the relationship between the identified covariates that are believed to impact the price of whisky. The optimal marketing strategy for whisky producers in the regions Islay and Campbeltown are discussed. This analysis is based on the Marketing Mix. Furthermore, a Porter’s five forces analysis, focusing on the regions Campeltown and Islay, is examined. Finally the findings are summar...

  3. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice


    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  4. Dimension Reduction of the Explanatory Variables in Multiple Linear Regression

    Filzmoser, P.; Croux, Christophe


    Abstract: In classical multiple linear regression analysis problems will occur if the regressors are either multicollinear or if the number of regressors is larger than the number of observations. In this note a new method is introduced which constructs orthogonal predictor variables in a way to have a maximal correlation with the dependent variable. The predictor variables are linear combinations of the original regressors. This method allows a major reduction of the number of predictors ...

  5. Shrinkage Estimation and Selection for Multiple Functional Regression



    Functional linear regression is a useful extension of simple linear regression and has been investigated by many researchers. However, functional variable selection problems when multiple functional observations exist, which is the counterpart in the functional context of multiple linear regression, is seldom studied. Here we propose a method using group smoothly clipped absolute deviation penalty (gSCAD) which can perform regression estimation and variable selection simultaneously. We show t...

  6. Multiple Linear Regression Model Used in Economic Analyses

    Constantin ANGHELACHE; Madalina Gabriela ANGHEL; Ligia PRODAN; Cristina SACALA; Marius POPOVICI


    The multiple regression is a tool that offers the possibility to analyze the correlations between more than two variables, situation which account for most cases in macro-economic studies. The best known method of estimation for multiple regression is the method of least squares. As in the two-variable regression, we choose the regression function of sample and minimize the sum of squared residual values. Another method that allows us to take into account the number of variables factor when d...

  7. Study relationship between inorganic and organic coal analysis with gross calorific value by multiple regression and ANFIS

    Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.


    The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.

  8. Fuzzy multiple linear regression: A computational approach

    Juang, C. H.; Huang, X. H.; Fleming, J. W.


    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  9. Credit Scoring Problem Based on Regression Analysis

    Khassawneh, Bashar Suhil Jad Allah


    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  10. Quantifying TiO2 Abundance of Lunar Soils:Partial Least Squares and Stepwise Multiple Regression Analysis for Determining Causal Effect

    Lin Li


    Partial least squares (PLS) regression was applied to the Lunar Soil Characterization Consortium (LSCC) dataset for spectral estimation of TiO2.The LSCC dataset was split into a number of subsets including the low-Ti,high-Ti,total mare soils,total highland,Apollo 16,and Apollo 14 soils to investigete the effects of interfering minerals and nonlinearity on the PLS performance.The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance.PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together.The results suggest that while the dominant TiO2-bearing minerals are few,additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2,to accommodate nonlinear relationships between reflectance and TiO2,and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples.Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups.For the LSCC Apollo 16 samples,the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2.For the Apollo 14 soils,more accurate estimation for TiO2 is attributed to the positive correlation between a major TiO2-bearing component and TiO2,explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.

  11. A combined multiple regression-time series approach to process capability analysis when data are auto correlated

    The problem of performing process capability analysis when auto correlations are present is discussed. It is shown that when the systematic nonrandom phenomenon induced by autocorrelation is ignored the variance estimate obtained from the original data is no longer an appropriate estimate for use in the process capability analyses. A remedial measure based on an autoregressive integrated moving average model is proposed. It is also shown that the process variance estimated from the residual analysis yields appropriate results for the process capability indices

  12. Exploring the equity of GP practice prescribing rates for selected coronary heart disease drugs: a multiple regression analysis with proxies of healthcare need

    St Leger Antony S


    Full Text Available Abstract Background There is a small, but growing body of literature highlighting inequities in GP practice prescribing rates for many drug therapies. The aim of this paper is to further explore the equity of prescribing for five major CHD drug groups and to explain the amount of variation in GP practice prescribing rates that can be explained by a range of healthcare needs indicators (HCNIs. Methods The study involved a cross-sectional secondary analysis in four primary care trusts (PCTs 1–4 in the North West of England, including 132 GP practices. Prescribing rates (average daily quantities per registered patient aged over 35 years and HCNIs were developed for all GP practices. Analysis was undertaken using multiple linear regression. Results Between 22–25% of the variation in prescribing rates for statins, beta-blockers and bendrofluazide was explained in the multiple regression models. Slightly more variation was explained for ACE inhibitors (31.6% and considerably more for aspirin (51.2%. Prescribing rates were positively associated with CHD hospital diagnoses and procedures for all drug groups other than ACE inhibitors. The proportion of patients aged 55–74 years was positively related to all prescribing rates other than aspirin, where they were positively related to the proportion of patients aged >75 years. However, prescribing rates for statins and ACE inhibitors were negatively associated with the proportion of patients aged >75 years in addition to the proportion of patients from minority ethnic groups. Prescribing rates for aspirin, bendrofluazide and all CHD drugs combined were negatively associated with deprivation. Conclusion Although around 25–50% of the variation in prescribing rates was explained by HCNIs, this varied markedly between PCTs and drug groups. Prescribing rates were generally characterised by both positive and negative associations with HCNIs, suggesting possible inequities in prescribing rates on the basis

  13. Elevated-temperature, strain-controlled fatigue data on Type 304 stainless steel. A compilation, multiple linear regression model, and statistical analysis

    Diercks, D R; Raske, D T


    The available elevated-temperature, strain-controlled, uniaxial fatigue data on Type 304 stainless steel (474 data points) are tabulated, and variables that influence cyclic life are divided into first- and second-order categories. The first-order variables, which include strain range, strain rate, temperature, and hold time, were used in a multiple linear regression analysis to describe the observed variation in fatigue life for zero and tension hold-time data. Goodness of fit, with respect to these variables, as well as the appropriateness of the transformations used are discussed. Prediction intervals are estimated, and comparisons between the regression equation curves and the data from which they were obtained are made. The second-order variables include the laboratories at which the data were generated, the different heats from which the test specimens were fabricated, and the heat treatments that preceded testing. These variables were statistically analyzed to determine their effect on fatigue life. The results are discussed, and the heats and heat treatments that are most resistant to fatigue damage under these loading and environmental conditions are identified.

  14. An Additive-Multiplicative Cox-Aalen Regression Model

    Scheike, Thomas H.; Zhang, Mei-Jie


    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  15. Prognostic factors in patients with cervix cancer treated by radiation therapy: results of a multiple regression analysis

    A retrospective analysis of 965 patients with invasive cervix cancer treated by radiation therapy between 1976 and 1981 was performed in order to evaluate prognostic factors for disease-free survival (DFS) and pelvic control. FIGO stage was the most powerful prognostic factor followed by radiation dose and treatment duration (P values = 0.0001). If the analysis was limited to patients treated with radical doses of 75 Gy or more, dose was no longer significant. Young age at diagnosis, non-squamous histology and transfusion during treatment were also adverse prognostic factors for survival and control. Para-aortic nodal involvement on lymphogram was associated with a reduction in DFS (P = 0.0027), whereas pelvic lymph node involvement alone was not. In patients with Stage I and IIA disease, tumour size was the most powerful prognostic factor for survival (P = 0.0001) and the extent of pelvic sidewall involvement was significant in patients with Stage III tumours (P = 0.007). Histological grade appeared to be a predictive factor but was only recorded in 712 patients. These features should be considered in the staging of patients and in the design of clinical trials

  16. Retail sales forecasting with application the multiple regression

    Kuzhda, Tetyana


    Full Text Available The article begins with a formulation for predictive learning called multiple regression model. Theoretical approach on construction of the regression models is described. The key information of the article is the mathematical formulation for the forecast linear equation that estimates the multiple regression model. Calculation the quantitative value of dependent variable forecast under influence of independent variables is explained. This paper presents the retail sales forecasting with multiple model estimation. One of the most important decisions a retailer can make with information obtained by the multiple regression. Recently, a changing retail environment is causing by an expected consumer’s income and advertising costs. Checking model on the goodness of fit and statistical significance are explored in the article. Finally, the quantitative value of retail sales forecast based on multiple regression model is calculated.

  17. The use of multiple linear regression in property valuation

    Marko Pejić


    Full Text Available The property appraisal is of great importance for one country and its economy. Nowadays, successful land management system could not be imagined without the subsystem related to market economy. Having the information about land and its values offer broad possibilities for market economy and strongly influence development of the real estate market. Special attention should be paid to the mass appraisal methods and its use in developing the tax system and framework for appropriate property appraisal system. Multiple regression analysis is just one of the methods used for this purpose and this article is focused to its characteristics and advantages in mass appraisal system development.

  18. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E


    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. PMID:24442792

  19. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Hjartåker Anette


    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  20. Fuzzy Multiple Regression Model for Estimating Software Development Time

    Venus Marza


    Full Text Available As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy Logic model & Multiple Regression model based on their accuracy.

  1. A multiple covariance approach to PLS regression with several predictor groups: Structural Equation Exploratory Regression

    Bry, Xavier; Verron, Thomas; Cazes, Pierre


    A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in groups and investiga...

  2. Synthesis analysis of regression models with a continuous outcome

    Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin


    To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method...


    Barbu Bogdan POPESCU


    Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.


    Barbu Bogdan POPESCU; Lavinia Stefania TOTAN


    There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.

  5. ERP correlates of word production predictors in picture naming: a trial by trial multiple regression analysis from stimulus onset to response

    Valente, Andrea; Bürki, Audrey; Laganaro, Marina


    A major effort in cognitive neuroscience of language is to define the temporal and spatial characteristics of the core cognitive processes involved in word production. One approach consists in studying the effects of linguistic and pre-linguistic variables in picture naming tasks. So far, studies have analyzed event-related potentials (ERPs) during word production by examining one or two variables with factorial designs. Here we extended this approach by investigating simultaneously the effects of multiple theoretical relevant predictors in a picture naming task. High density EEG was recorded on 31 participants during overt naming of 100 pictures. ERPs were extracted on a trial by trial basis from picture onset to 100 ms before the onset of articulation. Mixed-effects regression models were conducted to examine which variables affected production latencies and the duration of periods of stable electrophysiological patterns (topographic maps). Results revealed an effect of a pre-linguistic variable, visual complexity, on an early period of stable electric field at scalp, from 140 to 180 ms after picture presentation, a result consistent with the proposal that this time period is associated with visual object recognition processes. Three other variables, word Age of Acquisition, Name Agreement, and Image Agreement influenced response latencies and modulated ERPs from ~380 ms to the end of the analyzed period. These results demonstrate that a topographic analysis fitted into the single trial ERPs and covering the entire processing period allows one to associate the cost generated by psycholinguistic variables to the duration of specific stable electrophysiological processes and to pinpoint the precise time-course of multiple word production predictors at once. PMID:25538546

  6. Multiple regression analyses in the prediction of aerospace instrument costs

    Tran, Linh

    The aerospace industry has been investing for decades in ways to improve its efficiency in estimating the project life cycle cost (LCC). One of the major focuses in the LCC is the cost/prediction of aerospace instruments done during the early conceptual design phase of the project. The accuracy of early cost predictions affects the project scheduling and funding, and it is often the major cause for project cost overruns. The prediction of instruments' cost is based on the statistical analysis of these independent variables: Mass (kg), Power (watts), Instrument Type, Technology Readiness Level (TRL), Destination: earth orbiting or planetary, Data rates (kbps), Number of bands, Number of channels, Design life (months), and Development duration (months). This author is proposing a cost prediction approach of aerospace instruments based on these statistical analyses: Clustering Analysis, Principle Components Analysis (PCA), Bootstrap, and multiple regressions (both linear and non-linear). In the proposed approach, the Cost Estimating Relationship (CER) will be developed for the dependent variable Instrument Cost by using a combination of multiple independent variables. "The Full Model" will be developed and executed to estimate the full set of nine variables. The SAS program, Excel, Automatic Cost Estimating Integrate Tool (ACEIT) and Minitab are the tools to aid the analysis. Through the analysis, the cost drivers will be identified which will help develop an ultimate cost estimating software tool for the Instrument Cost prediction and optimization of future missions.

  7. Gaussian process regression analysis for functional data

    Shi, Jian Qing


    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  8. Multiple kernel support vector regression for pricing nifty option

    Neetu Verma


    Full Text Available The goal of present experiments is to investigate the use of multiple kernel learning as a tool for pricing options in the context of Indian stock market for Nifty index options. In this paper, fair price of an option is predicted by Multiple Kernel Support Vector Regression (MKLSVR using linear combinations of kernels and Single Kernel Support Vector Regression (SKSVR. Prices of option highly depend on different money market conditions like deep-in-the-money, in-the-money, at-the-money, out-of-money and deep-out-of-money condition. The experimental study attempts to identify the forecasting errors with the help of mean square error; root meant square error, and normalized root meant square error between the market option prices and the calculated option prices by model for all market conditions. The results reflect that multiple kernel support vector regression performed fairly well in comparison to support vector regression with single kernel.

  9. Sample Sizes when Using Multiple Linear Regression for Prediction

    Knofczynski, Gregory T.; Mundfrom, Daniel


    When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios…

  10. Vehicle Travel Time Predication based on Multiple Kernel Regression

    Wenjing Xu


    Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.

  11. On relationship between regression models and interpretation of multiple regression coefficients

    A N Varaksin; Panov, V. G.


    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no ...

  12. Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model

    Souvik Bhattacharyya


    Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is firstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classifier. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to find out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.

  13. Teasing out the effect of tutorials via multiple regression

    Chasteen, Stephanie V.


    We transformed an upper-division physics course using a variety of elements, including homework help sessions, tutorials, clicker questions with peer instruction, and explicit learning goals. Overall, the course transformations improved student learning, as measured by our conceptual assessment. Since these transformations were multi-faceted, we would like to understand the impact of individual course elements. Attendance at tutorials and homework help sessions was optional, and occurred outside the class environment. In order to identify the impact of these optional out-of-class sessions, given self-selection effects in student attendance, we performed a multiple regression analysis. Even when background variables are taken into account, tutorial attendance is positively correlated with student conceptual understanding of the material - though not with performance on course exams. Other elements that increase student time-on-task, such as homework help sessions and lectures, do not achieve the same impacts.

  14. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  15. Simultaneous confidence bands in linear regression analysis

    Ah-Kine, Pascal Soon Shien


    A simultaneous confidence band provides useful information on the plausible range of an unknown regression model. For a simple linear regression model, the most frequently quoted bands in the statistical literature include the two-segment band, the three-segment band and the hyperbolic band, and for a multiple linear regression model, the most com- mon bands in the statistical literature include the hyperbolic band and the constant width band. The optimality criteria for confid...

  16. Computing multiple-output regression quantile regions from projection quantiles

    Paindaveine, D.; Šiman, Miroslav


    Roč. 27, č. 1 (2012), s. 29-49. ISSN 0943-4062 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : directional quantile * halfspace depth * multiple-output regression * parametric programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 0.482, year: 2012

  17. Multivariate quantiles and multiple-output regression quantiles

    Hallin, Marc; Paindaveine, Davy; Siman, Miroslav


    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett s traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the cl...

  18. Local bilinear multiple-output quantile/depth regression

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Šiman, Miroslav


    A new quantile regression concept, based on a directional version of Koenker and Bassett’s traditional single-output one, has been introduced in [ Ann. Statist. (2010) 38 635–669] for multiple-output location/linear regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear (actually, bilinear) versions o...

  19. Local Constant and Local Bilinear Multiple-Output Quantile Regression

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Siman, Miroslav


    A new quantile regression concept, based on a directional version of Koenker and Bassett’s traditional single-output one, has been introduced in [Hallin, Paindaveine and ˇSiman, Annals of Statistics 2010, 635-703] for multiple-output regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear versio...

  20. Elliptical multiple-output quantile regression and convex optimization

    Hallin, M.; Šiman, Miroslav


    Roč. 109, č. 1 (2016), s. 232-237. ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.595, year: 2014

  1. Analysis of Inflation in Turkey via Ridge Regression

    Duygu Tunali; Emel Şıiklar


    The aim of this study is to analyze inflation in Turkey between the years 2003-2014 and also compare the inflation for the period 2003-2014 with inflation in the years 1963-1983 in Turkey. When multiple linear regression modeling is used for inflation analysis, multicollinearity problem occurred between independent variables. In this study to eliminate the problem in concern ; ridge regression, which is one of the biased estimation methods, is used. Ridge regression method, gives smaller mean...

  2. A field operational test on valve-regulated lead-acid absorbent-glass-mat batteries in micro-hybrid electric vehicles. Part II. Results based on multiple regression analysis and tear-down analysis

    Schaeck, S.; Karspeck, T.; Ott, C.; Weirather-Koestner, D.; Stoermer, A. O.


    In the first part of this work [1] a field operational test (FOT) on micro-HEVs (hybrid electric vehicles) and conventional vehicles was introduced. Valve-regulated lead-acid (VRLA) batteries in absorbent glass mat (AGM) technology and flooded batteries were applied. The FOT data were analyzed by kernel density estimation. In this publication multiple regression analysis is applied to the same data. Square regression models without interdependencies are used. Hereby, capacity loss serves as dependent parameter and several battery-related and vehicle-related parameters as independent variables. Battery temperature is found to be the most critical parameter. It is proven that flooded batteries operated in the conventional power system (CPS) degrade faster than VRLA-AGM batteries in the micro-hybrid power system (MHPS). A smaller number of FOT batteries were applied in a vehicle-assigned test design where the test battery is repeatedly mounted in a unique test vehicle. Thus, vehicle category and specific driving profiles can be taken into account in multiple regression. Both parameters have only secondary influence on battery degradation, instead, extended vehicle rest time linked to low mileage performance is more serious. A tear-down analysis was accomplished for selected VRLA-AGM batteries operated in the MHPS. Clear indications are found that pSoC-operation with periodically fully charging the battery (refresh charging) does not result in sulphation of the negative electrode. Instead, the batteries show corrosion of the positive grids and weak adhesion of the positive active mass.

  3. Functional linear regression via canonical analysis

    He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228


    We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.

  4. Theoretical Aspects Regarding the Use of the Multiple Linear Regression Model in Economic Analyses

    Constantin ANGHELACHE; Ioan PARTACHI; Adina Mihaela DINU; Ligia PRODAN; Georgeta BARDAªU (LIXANDRU)


    In this paper we have studied the dependence between GDP, final consumption and net investments. To analyze this correlation, the article proposes a multiple regression model, extremely useful tool in economic analysis. Regression model described in the article considers the GDP as outcome variables and final consumption and net investment as factorial variables.

  5. Confidence Intervals for an Effect Size Measure in Multiple Linear Regression

    Algina, James; Keselman, H. J.; Penfield, Randall D.


    The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…

  6. Direction of Effects in Multiple Linear Regression Models.

    Wiedermann, Wolfgang; von Eye, Alexander


    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741

  7. Applied regression analysis a research tool

    Pantula, Sastry; Dickey, David


    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  8. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim


    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  9. On relationship between regression models and interpretation of multiple regression coefficients

    Varaksin, A N


    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no linear statistical dependence on other presented variables.

  10. Regression Analysis and the Sociological Imagination

    De Maio, Fernando


    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  11. A comparative analysis of the effects of instructional design factors on student success in e-learning: multiple-regression versus neural networks

    Halil Ibrahim Cebeci


    Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.

  12. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    Whitlock, C. H., III


    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  13. Application of Partial Least-Squares Regression Model on Temperature Analysis and Prediction of RCCD

    Yuqing Zhao; Zhenxian Xing


    This study, based on the temperature monitoring data of jiangya RCCD, uses principle and method of partial least-squares regression to analyze and predict temperature variation of RCCD. By founding partial least-squares regression model, multiple correlations of independent variables is overcome, organic combination on multiple linear regressions, multiple linear regression and canonical correlation analysis is achieved. Compared with general least-squares regression model result, it is more ...

  14. Bayes Linear Sufficiency in Non-exchangeable Multivariate Multiple Regressions

    Wooff, D. A.


    We consider sufficiency for Bayes linear revision for multivariate multiple regression problems, and in particular where we have a sequence of multivariate observations at different matrix design points, but with common parameter vector. Such sequences are not usually exchangeable. However, we show that there is a sequence of transformed observations which is exchangeable and we demonstrate that their mean is sufficient both for Bayes linear revision of the parameter vector and for prediction...

  15. A unified framework for model-based clustering, linear regression and multiple cluster structure detection

    Galimberti, Giuliano; Manisi, Annamaria; Soffritti, Gabriele


    A general framework for dealing with both linear regression and clustering problems is described. It includes Gaussian clusterwise linear regression analysis with random covariates and cluster analysis via Gaussian mixture models with variable selection. It also admits a novel approach for detecting multiple clusterings from possibly correlated sub-vectors of variables, based on a model defined as the product of conditionally independent Gaussian mixture models. A necessary condition for the ...

  16. Multiple regression analysis to assess the role of plankton on the distribution and speciation of mercury in water of a contaminated lagoon.

    Stoichev, T; Tessier, E; Amouroux, D; Almeida, C M; Basto, M C P; Vasconcelos, V M


    Spatial and seasonal variation of mercury species aqueous concentrations and distributions was carried out during six sampling campaigns at four locations within Laranjo Bay, the most mercury-contaminated area of the Aveiro Lagoon (Portugal). Inorganic mercury (IHg(II)) and methylmercury (MeHg) were determined in filter-retained (IHgPART, MeHgPART) and filtered (<0.45μm) fractions (IHg(II)DISS, MeHgDISS). The concentrations of IHgPART depended on site and on dilution with downstream particles. Similar processes were evidenced for MeHgPART, however, its concentrations increased for particles rich in phaeophytin (Pha). The concentrations of MeHgDISS, and especially those of IHg(II)DISS, increased with Pha concentrations in the water. Multiple regression models are able to depict MeHgPART, IHg(II)DISS and MeHgDISS concentrations with salinity and Pha concentrations exhibiting additive statistical effects and allowing separation of possible addition and removal processes. A link between phytoplankton/algae and consumers' grazing pressure in the contaminated area can be involved to increase concentrations of IHg(II)DISS and MeHgPART. These processes could lead to suspended particles enriched with MeHg and to the enhancement of IHg(II) and MeHg availability in surface waters and higher transfer to the food web. PMID:27484944

  17. Prediction on adsorption ratio of carbon dioxide to methane on coals with multiple linear regression

    YU Hong-guan; MENG Xian-ming; FAN Wei-tang; YE Jian-ping


    The multiple linear regression equations for adsorption ratio of CO2/CH4 and its coal quality indexes were built with SPSS software on basis of existing coal quality data and its adsorption amount of CO2 and CH4.The regression equations built were tested with data collected from some S,and the influences of coal quality indexes on adsorption ratio of CO2/CH4 were studied with investigation of regression equations.The study results show that the regression equation for adsorption ratio of CO2/CH4 and volatile matter,ash and moisture in coal can be Obtained with multiple linear regression analysis,that the influence of same coal quality index with the degree of metamorphosis or influence of coal quality indexes for same coal rank on adsorption ratio is not consistent.

  18. 火灾与社会经济环境的多元回归分析%Multiple regression analysis on fire and socioeconomic environment



    By mathematical application software such as SPSS, Excel, MATLAB etc. , the fire and socioeconomic environment were analyzed by scatter plot, correlation analysis, principal component analysis and regression analysis. Taking fire situation in 2009 as an example, the influence of socioeconomic environment to fire was studied, which can provide reference for the fire prevention and socioeconomic environment coordinated development.%借助SPSS、Excel、MATLAB等数学应用软件,对火灾与社会经济环境进行散点图分析、相关分析、主成分分析及回归分析,以2009年全国火灾形势为例,研究社会经济环境诸指标对火灾的影响,为更好地防范火灾、促进社会经济环境协调发展提供科学依据和决策参考.

  19. Analysis of Inflation in Turkey via Ridge Regression

    Duygu Tunalı


    Full Text Available The aim of this study is to analyze inflation in Turkey between the years 2003-2014 and also compare the inflation for the period 2003-2014 with inflation in the years 1963-1983 in Turkey. When multiple linear regression modeling is used for inflation analysis, multicollinearity problem occurred between independent variables. In this study to eliminate the problem in concern ; ridge regression, which is one of the biased estimation methods, is used. Ridge regression method, gives smaller mean square error made by the least squares method based on β parameter estimator without removing variables of the model.

  20. Poisson regression analysis of ungrouped data

    Loomis, D; Richardson, D.; Elliott, L


    Background: Poisson regression is routinely used for analysis of epidemiological data from studies of large occupational cohorts. It is typically implemented as a grouped method of data analysis in which all exposure and covariate information is categorised and person-time and events are tabulated.

  1. A Regression Analysis Model Based on Wavelet Networks

    XIONG Zheng-feng


    In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.

  2. Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis

    Kim, Rae Seon


    When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…

  3. Hot Resistance Estimation for Dry Type Transformer Using Multiple Variable Regression, Multiple Polynomial Regression and Soft Computing Techniques

    M. Srinivasan


    Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.

  4. Precipitation interpolation in mountainous regions using multiple linear regression

    Hay, L.; Viger, R.; McCabe, G.


    Multiple linear regression (MLR) was used to spatially interpolate precipitation for simulating runoff in the Animas River basin of southwestern Colorado. MLR equations were defined for each time step using measured precipitation as dependent variables. Explanatory variables used in each MLR were derived for the dependent variable locations from a digital elevation model (DEM) using a geographic information system. The same explanatory variables were defined for a 5 ?? 5 km grid of the DEM. For each time step, the best MLR equation was chosen and used to interpolate precipitation onto the 5 ?? 5 km grid. The gridded values of precipitation provide a physically-based estimate of the spatial distribution of precipitation and result in reliable simulations of daily runoff in the Animas River basin.

  5. A multiple regression model for the Ft. Calhoun reactor coolant pump system

    Multiple regression analysis is one of the most widely used of all statistical tools. In this research paper, we introduce an application of fitting a multiple regression model on reactor coolant pump (RCP) data. The primary purpose of this research is to correlate the results obtained by Design of Experiments (DOE) and regression model fitting. Also, the idea behind using regression model is to gain more detailed information in the RCP data than provided by DOE. In engineering science, statistical quality control techniques have traditionally been applied to control manufacturing processes. An application to commercial nuclear power plant maintenance and control is presented that can greatly improve plant safety and reliability. The result obtained show that six out of ten parameters are under control specification limits and four parameters are not in the state of statistical control. The four parameters that are out of control adversely affect the regression model fitting and the final prediction equation, thereby, does not predict accurate response for the future. The analysis concludes that in order to fit a best regression model, one has to remove all out of control points from the data set, including dropping a variable from the model to have better prediction of the response variable. (author)

  6. Regression analysis of post-CHF flow boiling data

    The successful application of statistical analysis in systematic investigations of heat transfer data for boiling water beyond the critical heat flux is described. Multiple linear regression analysis together with statistical tests of correlations and data were used in this study. Data from a number of experiments encompassing film and transition boiling in several geometries were correlated by boiling regime, by geometry, and in aggregate. Error estimates and uncertainty bounds were specified for all such correlations. (U.S.)

  7. Functional data analysis of generalized regression quantiles

    Guo, Mengmeng


    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  8. Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

    Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro


    Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. PMID

  9. Multiple regression models for energy use in air-conditioned office buildings in different climates

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  10. Landslide Susceptibility Mapping Using Multiple Regression and GIS Tools in Tajan Basin, North of Iran

    Somayeh Mashari; Karim Solaimani; Ebrahim Omidvar


    Landslide is a natural hazard that causes many damages to the environment. Depending on the landform, several factors can cause the Landslide. This research addresses the methodology for landslide susceptibility mapping using multiple regression analysis and GIS tools. Based on the initial hypothesis, ten factors were recognized as effectual elements on landslide, which is geology, slope, aspect, distance from roads, faults and drainage network, soil capability, land use and rainfall. Crossin...

  11. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

    He, Dan; Kuhn, David; Parida, Laxmi


    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other...

  12. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components

    Riccardi, M.; Mele, G.; Pulvento, C.;


    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expe...... foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. © 2014 Springer Science+Business Media Dordrecht....... expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB...... components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by...

  13. Assessing the binding affinity of a selected class of DPP4 inhibitors using chemical descriptor-based multiple linear regression

    Jose Isagani Janairo; Gerardo Janairo; Frumencio Co; Derrick Ethelbhert Yu


    The activity of a selected class of DPP4 inhibitors was preliminarily assessed using chemical descriptors derived AM1 optimized geometries. Using multiple linear regression model, it was found that ?E0, LUMO energy, area, molecular weight and ?H0 are the significant descriptors that can adequately assess the binding affinity of the compounds. The derived multiple linear regression (MLR) model was validated using rigorous statistical analysis. The preliminary model suggests t...

  14. Genetic Algorithm Based Outlier Detection Using Bayesian Information Criterion in Multiple Regression Models Having Multicollinearity Problems

    ALMA, Özlem GÜRÜNLÜ; KURT, Serdar; UĞUR, Aybars


    Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important proble...

  15. Forecasting Gold Prices Using Multiple Linear Regression Method

    Z. Ismail


    Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as “forecast-1” was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on “a hunch of experts”, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to

  16. Forecasting Electrical Load using ANN Combined with Multiple Regression Method

    Saeed M. Badran; Ossama B. Abouelatta


    This paper combined artificial neural network and regression modeling methods to predict electrical load. We propose an approach for specific day, week and/or month load forecasting for electrical companies taking into account the historical load. Therefore, a modified technique, based on artificial neural network (ANN) combined with linear regression, is applied on the KSA electrical network dependent on its historical data to predict the electrical load demand forecasting up to year 2020. T...

  17. Multiple predictor smoothing methods for sensitivity analysis

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  18. Multiple predictor smoothing methods for sensitivity analysis.

    Helton, Jon Craig; Storlie, Curtis B.


    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

  19. 神经症患者抑郁状况影响因素的分析%Multiple linear regression and path analysis on influencing factors of depressive neurosis

    江家靖; 郭文斌(通讯作者)


    Objective: To explore the influencing factors and mechanism of depressive neurosis by multiple linear regression and path analysis. Methods: 55 cases of depressive neurosis in open wards of our hospital were investigated. CES-D, ATQ, CTQ, SAS, SCSQ and SSRS were used to explore the influencing factors of depressive neurosis. Multiple linear regression and path analysis were used to probe the influence of depression with life events, coping style and social support. Role model of the main factors was analyzed in this study. Results: Stress response, automatic thoughts, life events, subjective support, objective support, and actively or negative coping were main factors of depressive neurosis. The influences of various factors on depression were different. Automatic thoughts and coping style could induce depression through a direct way. Social support had no direct relation to depression whose path coefficients was not statistical y significant. Life events could influence the occurrence of depression through indirect channels (mediated by coping style). Conclusions: The occurrence of depression is a combined effect of life events, coping style and social support. Multiple regression analysis and path analysis have their own effect in exploring the mechanism of depression. The results can be mutual y complementary to each other.%目的:对神经症患者抑郁影响因素进行多元线性回归和路径分析,探讨影响抑郁的因素与作用机制。方法应用流调中心用抑郁量表、自动思维问卷、儿童期经历问卷、ZUNG焦虑自评量表、简易应对方式量表、自尊量表、攻击行为量表和社会支持量表对55例在广西医科大学第一附属医院开放病房住院且经专科医生确诊为神经症患者抑郁的影响因素进行调查与测评。同时应用多元线性回归和路径分析方法调查应对方式、生活事件及社会支持等在抑郁中的影响程度与作用模式,分析各影响因素对

  20. Single and multiple index functional regression models with nonparametric link

    Chen, Dong; Hall, Peter; Müller, Hans-Georg


    Fully nonparametric methods for regression from functional data have poor accuracy from a statistical viewpoint, reflecting the fact that their convergence rates are slower than nonparametric rates for the estimation of high-dimensional functions. This difficulty has led to an emphasis on the so-called functional linear model, which is much more flexible than common linear models in finite dimension, but nevertheless imposes structural constraints on the relationship between predictors and re...

  1. The Linear and Non-displaced Estimator in Multiple Regression

    Constantin ANGHELACHE; Voineagu, Vergil; Alexandru MANOLE; Diana Valentina SOARE; Ligia PRODAN


    Under the hypotheses IA and IB, OLS estimators are both linear and stationary. For it to provide the same minimum variance of all linear and stationary estimators and to take part of BLUE, it is necessary that the classical assumptions IIB and IIC should be available. As in the case of two-variable regression, this means that the residual factors has to be homoschedastic and non-autocorrelated.

  2. Repeated Results Analysis for Middleware Regression Benchmarking

    Bulej, Lubomír; Kalibera, T.; Tůma, P.


    Roč. 60, - (2005), s. 345-358. ISSN 0166-5316 R&D Projects: GA ČR GA102/03/0672 Institutional research plan: CEZ:AV0Z10300504 Keywords : middleware benchmarking * regression benchmarking * regression testing Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.756, year: 2005

  3. Sliced Inverse Regression for Time Series Analysis

    Chen, Li-Sue


    In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.

  4. Determinants of Serum PCBs in Adolescents and Adults: Regression Tree Analysis and Linear Regression Analysis

    Govarts, Eva; Den Hond, Elly; Schoeters, Greet; Bruckers, Liesbeth


    Regression tree analysis, a non-parametric method, was undertaken to identify predictors of the serum concentration of polychlorinated biphenyls (sum of marker PCB1 138, 153, and 180) in humans. This method was applied on biomonitoring data of the Flemish Environment and Health study (2002-2006) and included 1679 adolescents and 1583 adults. Potential predictor variables were collected via a self-administered questionnaire, assessing information on lifestyle, food intake, use of tobacco and a...


    Kürşad ÖZKAN


    The purpose of the paper is to determine a model, the soil field water capacity in accordance with soil texture. At first, multiple regression analysis has been used to determine a model. But, it was found multiple relation problem in the model because of strong relationships among the independence variables. Therefore, principle component regression analysis was applied and the problem was solved. It is known that sand, dust and clay contents play important roles on field water capacity. But...

  6. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q


    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374

  7. Regression Analysis with a Stochastic Design Variable

    Sazak,, Hakan S.; Moti L Tiku; Qamarul Islam, M.


    In regression models, the design variable has primarily been treated as a nonstochastic variable. In numerous situations, however, the design variable is stochastic. The estimation and hypothesis testing problems in such situations are considered. Real life examples are given.

  8. Spatial regression analysis on 32 years total column ozone data

    J. S. Knibbe


    Full Text Available Multiple-regressions analysis have been performed on 32 years of total ozone column data that was spatially gridded with a 1° × 1.5° resolution. The total ozone data consists of the MSR (Multi Sensor Reanalysis; 1979–2008 and two years of assimilated SCIAMACHY ozone data (2009–2010. The two-dimensionality in this data-set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on non-seasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO, El Nino (ENSO and stratospheric alternative halogens (EESC. For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at high and mid-latitudes, the solar cycle affects ozone positively mostly at the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high Northern latitudes, the effect of QBO is positive and negative at the tropics and mid to high-latitudes respectively and ENSO affects ozone negatively between 30° N and 30° S, particularly at the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally

  9. Tightness of M-estimators for multiple linear regression in time series

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  10. Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models

    Lenk, Jan; M, Claus


    In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi

  11. Multinomial Inverse Regression for Text Analysis

    Taddy, Matt


    Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represen...

  12. Egg hatchability prediction by multiple linear regression and artificial neural networks

    AC Bolzan; RAF Machado; JCZ Piaia


    An artificial neural network (ANN) was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determi...

  13. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Davis, J. W.


    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  14. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    Ohlmacher, G.C.; Davis, J.C.


    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  15. Managing Software Project Risks (Analysis Phase) with Proposed Fuzzy Regression Analysis Modelling Techniques with Fuzzy Concepts

    Elzamly, Abdelrafe; Hussin, Burairah


    The aim of this paper is to propose new mining techniques by which we can study the impact of different risk management techniques and different software risk factors on software analysis development projects. The new mining technique uses the fuzzy multiple regression analysis techniques with fuzzy concepts to manage the software risks in a software project and mitigating risk with software process improvement. Top ten software risk factors in analysis phase and thirty risk management techni...

  16. Structure Coefficients versus Scoring Coefficients as Bases for Interpreting Emergent Variables in Multiple Regression and Related Techniques.

    Harris, Richard J.

    Interpretation of emergent variables on the basis of structure coefficients (zero order correlations between original and emergent variables) is potentially very misleading and should be avoided in favor of interpretation on the basis of scoring coefficients. This is most apparent in multiple regression analysis and its special case, two-group…

  17. Prediction of groundwater table and salinity fluctuations with a time series multiple regression technique

    Seeboonruang, U.


    Time series techniques have been extensively applied to research works of many academic disciplines, particularly those concerned with economics and environment. This paper presents application of a time series multiple linear regression technique to a groundwater system to predict groundwater level and salinity fluctuations in a saline area in the northeastern part of Thailand. Surface and groundwater interaction is the major mechanism controlling the shallow subsurface system and salinity of the area. The basic technique is based on the lagged correlation between hydrologic, and hydrogeological and environmental parameters. As a result of a large irrigation project in the area, several regulating gates have been installed to control flooding to the downstream rivers and to provide the upstream areas with sufficient irrigating water. From the lagged correlation analysis, the shallow groundwater and groundwater salinity fluctuation in the irrigating area are shown to be dependent upon the surface water levels at the installed regulated gates and prior rainfall. A set of multiple linear regression equations with lagged time dependent function are then formulated. The dependent variables are groundwater level and groundwater salinity while the independent variables are rainfall rates and water levels measured at the regulating gates. After calibration and verification, the model, as an alternative to the conventional method which requires detailed and continuous variables and is costlier, can be used to forecast and manage future groundwater systems.

  18. Survival Analysis with Multivariate adaptive Regression Splines

    Kriner, Monika


    Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear effects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate effects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...

  19. Multiple Logistic Regression Analysis of Social Networking Services on the College Students'Emotions,Depression and Self-esteem%大学生使用社交网络对情绪、抑郁、自尊的影响

    王晨羽; 徐骞; 陈紫薇; 林育芳


    Objective:To explore the impact of social networking services and various factors on emo-tions,depression, and self -esteem.Methods:The Chinese Affect Scale , the centre for Epidemiologic studies depression scale and the self -esteem scale were used to collect the data which was statistical de-scribed and multiple logistic regression analyzed by SPSS For Windows 19.0.Results:①The 512 college students were 20.50 ±1.49 years old on average.The average years of using SNS were at (7.16 ±2.67) years,the times to login in SNS per day was 14.31 ±15.96 times on average,the time spent on SNS per day was (2.81 ±2.04) hours on average,and the longest time of one single use was 2.98 ±2.76 hours on average .②Logistic regression analysis on positive emotions showed that OR of "engineering students"was 0.53(P=0.079)compared to "arts students".③Logistic regression analysis on negative emotions showed that OR of "age","years of using SNS"and"time spent on SNS per day"was 1.14 ( P =0.063),0.90(P=0.008)and 1.09(P=0.080).OR of the students who couldn't stand if stop using SNS for a month was 2.41(P=0.003)compared to the students who would feel more relaxed .④Logistic regression analysis on CES -D showed that OR of "the years of using SNS"was 0.89(P=0.007).And ORs of the junior students and senior students were 1.69(P=0.086)and 2.74(P=0.002)compared to the freshmen .ORs of the students who couldn't stand if stop using SNS for a month and the students who didn't care were 2.62(P=0.002)and 1.87(P=0.023)compared to the students who would feel more relaxed.⑤Logistic regression analysis on SES showed that OR of "the times to login in SNS per day"was 1.01(P=0.056).ORs of engineering students and science students were 0.56(P=0.046)and 0.49(P=0.028)compared to art students.OR of the students coming from city was 1.27(P=0.032)compared to the students coming from towns and villages .Conclusion:①Majors have an effect on the positive emo-tions.Age,the years of using SNS ,the time

  20. Simulation Experiments in Practice: Statistical Design and Regression Analysis

    Kleijnen, J.P.C.


    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic DOE and regression analysis assume a single simulation response that is normally and independen...

  1. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

    Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.


    The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

  2. Multiattribute shopping models and ridge regression analysis

    Timmermans, HJP Harry


    Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...


    DUMIRESCU Luigi; Stanciu, Oana; Mihai TICHINDELEAN; Simona VINEREAN


    The purpose of the paper is to illustrate the applicability of the linear multiple regression model within a marketing research based on primary, quantitative data. The theoretical background of the developed regression model is the value-chain concept of relationship marketing. In this sense, the authors presume that the outcome variable of the model, the monetary value of one purchase, depends on the clients’ expectations regarding seven dimensions of the company’s offer. The paper is struc...

  4. Analysis of genome-wide association data by large-scale Bayesian logistic regression

    Wang Yuanjia; Sha Nanshi; Fang Yixin


    Abstract Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data fro...

  5. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W


    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  6. The application of stata multiple inputation command to analyze design of experiments with multiple regression

    Clara Novoa; Suleima Alkusari


    This talk exemplifies the application of the multiple imputation technique available in STATA to analize a design of experiments with multiple responses and missing data. No imputation and multiple imputation methodologies are compared.


    Željko V. Račić


    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  8. Applying Least Absolute Shrinkage Selection Operator and Akaike Information Criterion Analysis to Find the Best Multiple Linear Regression Models between Climate Indices and Components of Cow’s Milk

    Mohammad Reza Marami Milani


    Full Text Available This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new, and respiratory rate predictor RRP with three main components of cow’s milk (yield, fat, and protein for cows in Iran. The least absolute shrinkage selection operator (LASSO and the Akaike information criterion (AIC techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49 respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001 with R2 (0.69. For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

  9. 老年多器官功能不全综合征发病危险因素的逐步Logistic回归分析%Stepwise Logistic Regression Analysis of Risk Factors of Multiple Organ Dysfunction Syndrome in Elderly

    谭清武; 李庆华


    Objective To study the risk factors of multiple organ dysfunction syndrome in elderly (MODSE).Methods A retrospective study was conducted on data of 393 patients aging over 60 hospitalized due to lung infection or having lung infection in hospital from 2001 to 2006.The patients were divided into group MODSE(n=196) and group non-MODSE(n=224).Risk factors of statistical significance were first screened out by single factor analysis,and then independent risk factors by stepwise Logistic regression analysis.Results Single factor analysis showed that age,chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary interstitial fibrosis,pulmonary heart disease,coronary heart disease,chronic cardiac insufficiency,cerebrovascular disease,cervical spondylosis,chronic hepatitis and cirrhosis,diabetes,hyperuricemia,chronic renal failure,malignant tumor,hemoglobin,albumin,urea nitrogen,creatinine and fasting blood glucose were risk factors of MODSE.Stepwise Logistic regression analysis showed that chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.Conclusion Chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.%目的 探讨老年多器官功能不全综合征(MODSE)的发病危险因素.方法 回顾性调查2001-2006年因肺部感染在我院住院或住院期间出现肺部感染的驻石家庄地区60岁以上的师以上军队离退休干部393例的病历资料,根据肺部感染是否诱发MODSE将393例患者分为MODSE组(169例)和非MODSE组(224例).先以单因素分析筛选有统计学

  10. Tightness of M-estimators for multiple linear regression in time series

    Johansen, Søren; Nielsen, Bent


    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires an assumption on the frequency of small regressors. We show that this is satisfied for a variety of deterministic and stochastic regressors, including stationary an random walks regressors. The resul...

  11. Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth

    Hallin, Marc; Paindaveine, Davy; Siman, Miroslav


    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett’s traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the cl...

  12. On asymptotics of t-type regression estimation in multiple linear model

    CUI Hengjian


    We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.

  13. Neutron multiplicity analysis tool

    Stewart, Scott L [Los Alamos National Laboratory


    I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from ({alpha},n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ({le} 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This

  14. An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms

    Sykas, Dimitris; Karathanassi, Vassilia


    This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.

  15. Research on the impact factors of domestic old people' s tourism consumption through multiple stepwise regression analysis%老年游客旅游决策影响因素之多元逐步回归分析



    作者历时2个多月,在大量走访以及问卷调查的基础之上,着重研究分析了影响国内老年游客旅游消费行为的众多因素,并运用多元逐步回归分析方法研究了各因素对老年人旅游消费行为的影响程度。结论显示,影响老年人旅游行为的主要有13个因素,其中老年人的收入水平、旅游地景点的吸引力是影响老年人旅游行为——旅游次数、旅游停留时间和旅游日消费额的共同因素,而收入水平最为关键。%As our country population aging advancement is more and more obvious, the old tourist industry is rapidly becoming an important part of the tour market. Experience and theory of tourism behavior have shown that travel frequency, residence time and amount of tourism consumption are the main indicators to measure the attractiveness of a tourism destination. This paper makes an empirical study through questionnaires among the old tourists located in 12 main tourist attractions in Xi' an. Based on 800 questionnaires, this paper emphatically analyses the influencing factors of the domestic old tourists' consumption behavior and employs the multiple stepwise regression analysis to have studied the affecting degree of every factor. Results conclude that 13 main factors affect the travel behavior of older people; they are physical condition, income, attitude of tourism, spouse, attitude of sons and daughters, related groups, tourism prices, distance, security, climatic conditions, food and accommodation, transport and tourism attraction. Among these factors, income and tourism attraction are the common factors affecting old tourists' travel frequency, residence time, amount of consumption per day. Specifically, the old tourists' travel frequency is directly proportional to income, attitude of tourism, attitude of sons and daughters, physical condition, tourism attraction and is inversely proportional to distance. The old tourists' residence

  16. Multiple Linear Regression Analysis of Quality of Life in Children with Cerebral Palsy%脑性瘫痪患儿生存质量相关因素多重线性回归分析

    万瑞平; 刘振寰; 林青梅


    Objective To analyze the correlative factors influencing quality of life(QOL) in children with cerebral palsy(CP). Methods Eighty children with CP( CP group) and 80 healthy children( healthy control group) were eveluated by Pediatric Quality of Life Inventory Version 4 (PedsQL4.0) to assess their QOL,and then the differences in QOL of children were compared between the 2 groups. Children with CP were also assessed using Gesell Developmental Scale(GDS) and Gross Motor Function Classification System(GMFCS) to test their developmental quotient and severity, and then the correlation among QOL,sex, family incomes, clinical types, GM FCS,and the intelligence capacity were analyzed by multiple regression analysis. Results There were significant differences in physical function/aspect, emotional function, social function, psychological aspect and total QOL between CP group and healthy conorol group (Pa < 0.01 ). Intelligence degree was positive correlated to total score of QOL. Severity degree and intelligence degree were positive correlated to physical aspect, and age was negative correlated to physical aspect, while severity degree affected physical aspect most. Intelligence degree was positive correlated to psychological aspects. Conclusions QOL of children with CP had impairment in full - scale. The intelligence capacity and the physical functions and intelligence degree are important factors which influence QOL of children with CP.%目的 分析影响脑性瘫痪(脑瘫)儿童生存质量的相关因素.方法 将确诊为脑瘫的80例患儿作为脑瘫组,同时选择80例同龄健康儿童作为健康对照组.采用儿童生存质量的PedsQL4.0普适性核心量表对2组儿童的生存质量进行评定,比较2组儿童生存质量的差异;采用粗大运动功能分级系统(GMFCS)评定脑瘫患儿粗大运动功能的级别,采用北京Gesell发育商评定脑瘫患儿的智力水平;采用多重线性回归分析脑瘫患儿生存质量与性别、月

  17. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  18. Tumor regression of multiple bone metastases from breast cancer after administration of strontium-89 chloride (Metastron)

    We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases

  19. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Cooper, Paul D.


    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  20. Regression calibration for classical exposure measurement error in environmental epidemiology studies using multiple local surrogate exposures.

    Bateson, Thomas F; Wright, J Michael


    Environmental epidemiologic studies are often hierarchical in nature if they estimate individuals' personal exposures using ambient metrics. Local samples are indirect surrogate measures of true local pollutant concentrations which estimate true personal exposures. These ambient metrics include classical-type nondifferential measurement error. The authors simulated subjects' true exposures and their corresponding surrogate exposures as the mean of local samples and assessed the amount of bias attributable to classical and Berkson measurement error on odds ratios, assuming that the logit of risk depends on true individual-level exposure. The authors calibrated surrogate exposures using scalar transformation functions based on observed within- and between-locality variances and compared regression-calibrated results with naive results using surrogate exposures. The authors further assessed the performance of regression calibration in the presence of Berkson-type error. Following calibration, bias due to classical-type measurement error, resulting in as much as 50% attenuation in naive regression estimates, was eliminated. Berkson-type error appeared to attenuate logistic regression results less than 1%. This regression calibration method reduces effects of classical measurement error that are typical of epidemiologic studies using multiple local surrogate exposures as indirect surrogate exposures for unobserved individual exposures. Berkson-type error did not alter the performance of regression calibration. This regression calibration method does not require a supplemental validation study to compute an attenuation factor. PMID:20573838

  1. Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique

    Ahn, Kuk-Hyun; Palmer, Richard


    Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.

  2. Innovation and market value: a quantile regression analysis

    Alex Coad; Rekha Rao


    We construct a new database by matching firm-level Compustat data to NBER patent data, for four 2-digit complex technology sectors. Whilst conventional regression estimators show that the stock market does recognise efforts at innovation, quantile regression analysis adds a new dimension to the literature, suggesting that the influence of innovation on market value varies dramatically across the market value distribution. For firms with a low value of Tobin's q, the stock market will barely r...

  3. Background stratified Poisson regression analysis of cohort data

    Richardson, David B.; Langholz, Bryan


    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approa...

  4. Linear regression and sensitivity analysis in nuclear reactor design

    Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data

  5. Sintering equation: determination of its coefficients by experiments - using multiple regression

    Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)

  6. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;


    power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...... conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in...

  7. 亚健康量表中过敏方面影响因素的多元逐步回归分析%Multiple Stepwise Regression Analysis on Affecting Factors of Allergy in Sub-Health Scale

    崔利宏; 何裕民; 倪红梅


    Objective:To study the affecting factors of allergy in sub-health people,and prevent the occurrence of allergy. Methods:Possible affecting factors of allergy in 6 975 cases of sub-health people were filtered by using multiple stepwise regression method. There were thirteen factors leading to the multiple stepwise regression model, namely: degree, age, fatigue, digestion, sleep, plant nerve, immunity, aging,constipation,depression,learning,memory,self-realization and sex. Results:Statistical results showed that allergy had correlation with 19 aspects of four areas of body performance,psychological,social adaptation,sex,and age and degree. Among these,a positive correlation was presented between allergy and 19 aspects of four areas of body performance,psychological ,social adaptation,sex,and degree and a negative correlation was existed between allergy and age,and both with statistical significance(P <0.01). Conclusion:The prevention of allergy should focus on the whole adjustment and strengthen people's physique. In clinical,affecting factors of allergy should be fully considered in order to avoid missed diagnosis, erroneous diagnosis and delay of the illness and reduce the quality of life.%目的:探讨亚健康人群中过敏的影响因素,预防过敏的发生.方法:对6 975例亚健康人群,采用多元逐步回归方法对过敏的可能影响因素进行筛选.进入多元逐步回归模型的因素有13个,分别是:学历、年龄、疲劳、消化、睡眠、植物神经、免疫力、衰老、便秘、抑郁、学习、记忆力、自我实现及性生活.结果:统计结果显示,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及年龄、学历均存在相关性.其中,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及学历均呈正相关;与年龄呈负相关.统计学均具有极显著性意义(P<0.01).结论:预防过敏应注重整体调整、增

  8. Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil

    Newton Carneiro Affonso da Costa Jr.


    Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.

  9. Time series analysis using semiparametric regression on oil palm production

    Yundari, Pasaribu, U. S.; Mukhaiyar, U.


    This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).

  10. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Ulbrich, N.; Bader, Jon B.


    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  11. QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions

    Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali


    The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.

  12. Multiple regression as a preventive tool for determining the risk of Legionella spp.

    Enrique Gea-Izquierdo


    Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.


    Carlos Monge Perry


    Full Text Available Structural equation modeling (SEM has traditionally been deployed in areas of marketing, consumer satisfaction and preferences, human behavior, and recently in strategic planning. These areas are considered their niches; however, there is a remarkable tendency in empirical research studies that indicate a more diversified use of the technique.  This paper shows the application of structural equation modeling using partial least square (PLS-SEM, in areas of manufacturing, quality, continuous improvement, operational efficiency, and environmental responsibility in Mexico’s medium and large manufacturing plants, while using a small sample (n = 40.  The results obtained from the PLS-SEM model application mentioned, are highly positive, relevant, and statistically significant. Also shown in this paper, for purposes of validity, reliability, and statistical power confirmation of PLS-SEM, is a comparative analysis against multiple regression showing very similar results to those obtained by PLS-SEM.  This fact validates the use of PLS-SEM in areas of untraditional scientific research, and suggests and invites the use of the technique in diversified fields of the scientific research

  14. Multiple regression method to determine aerosol optical depth in atmospheric column in Penang, Malaysia

    Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global

  15. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 °C, without detailed knowledge or need for simulation of the process. - Highlights: • The maximum thermal efficiency of ORCs in hundreds of cases was analysed. • Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. • Using only key design parameters, the maximum obtainable efficiency can be evaluated. • The regression models decrease the resources needed to evaluate the maximum potential. • The models are statistically strong and in good agreement with the literature

  16. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.


    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  17. 基于多元逐步回归的脑卒中发病影响因子分析%Analysis of stroke incidence impact factors based on stepwise multiple regression



    In this paper,stroke incidence impact factors were analyzed.First,the huge cases information through statistics and analysis,then it presented a mathematical model through regression fitting method,and established the relationship between stroke incidence and air temperature,barometric pressure and humidity.Last,it made some suggestions on the high-risk groups.As a result,the 2012 Higher Education Press Cup National Mathematical Contest in Modeling C title problem given a complete answer.%对脑卒中发病影响因子进行了分析和研究.首先对庞大的病例信息进行了统计分析,然后通过回归拟合的方法建立了数学模型,确立了脑卒中发病率与气温、气压和湿度间的关系,最后就高危人群提出了一些建议.由此,对2012“高教社杯”全国大学生数学建模竞赛C题的各问题给出了完整的解答.

  18. Sparse Regression by Projection and Sparse Discriminant Analysis

    Qi, Xin


    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  19. The use of weighted multiple linear regression to estimate QTL-by-QTL epistatic effects

    Jan Bocianowski


    Knowledge of the nature and magnitude of gene effects, as well as their contribution to the control of metric traits, is important in formulating efficient breeding programs for the improvement of plant genetics. Information concerning a genetic parameter such as the additive-by-additive epistatic effect can be useful in traditional breeding. This report describes the results obtained by applying weighted multiple linear regression to estimate the parameter connected with an additive-by-addit...

  20. Multiple Linear Regression Application on the Inter-Network Settlement of Internet

    YANG Qing-feng; ZHANG Qi-xiang; L(U) Ting-jie


    This paper develops an analytical framework to explain the Internet interconnection settlement issues. The paper shows that multiple linear regression can be used in assessing the network value of Internet Backbone Providers (IBPs).By using the exchange rate of each network, we can define a rate of network value, which reflects the contribution of each network to interconnection and the interconnected network resource usage by each of the network.

  1. Regression analysis for solving diagnosis problem of children's health

    Cherkashina, Yu A.; Gerget, O. M.


    The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.

  2. Driven Factors Analysis of China’s Irrigation Water Use Efficiency by Stepwise Regression and Principal Component Analysis

    Renfu Jia; Shibiao Fang; Wenrong Tu; Zhilin Sun


    This paper introduces an integrated approach to find out the major factors influencing efficiency of irrigation water use in China. It combines multiple stepwise regression (MSR) and principal component analysis (PCA) to obtain more realistic results. In real world case studies, classical linear regression model often involves too many explanatory variables and the linear correlation issue among variables cannot be eliminated. Linearly correlated variables will cause the invalidity of the fac...

  3. Regression Analysis: Instructional Resource for Cost/Managerial Accounting

    Stout, David E.


    This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…

  4. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Braun, Michael T; Oswald, Frederick L


    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits. PMID:21298571

  5. Multiple predictor smoothing methods for sensitivity analysis: Description of techniques

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  6. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf


    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037

  7. Specification and sensitivity analysis of cross-country growth regressions

    Thanasis Stengos; Theofanis P. Mamuneas; Pantelis Kalaitzidakis


    We compare the sensitivity analysis of cross-country growth regressions based on extreme bounds analysis to a more direct specification testing approach using non-nested hypotheses tests. The results suggest that those specifications that are adequate are also those that include two of the only few conditioning variables that are found to be robust, namely the standard deviation of inflation and the standard deviation of domestic credit.

  8. Predicting manual arm strength: A direct comparison between artificial neural network and multiple regression approaches.

    La Delfa, Nicholas J; Potvin, Jim R


    In ergonomics, strength prediction has typically been accomplished using linked-segment biomechanical models, and independent estimates of strength about each axis of the wrist, elbow and shoulder joints. It has recently been shown that multiple regression approaches, using the simple task-relevant inputs of hand location and force direction, may be a better method for predicting manual arm strength (MAS) capabilities. Artificial neural networks (ANNs) also serve as a powerful data fitting approach, but their application to occupational biomechanics and ergonomics is limited. Therefore, the purpose of this study was to perform a direct comparison between ANN and regression models, by evaluating their ability to predict MAS with identical sets of development and validation MAS data. Multi-directional MAS data were obtained from 95 healthy female participants at 36 hand locations within the reach envelope. ANN and regression models were developed using a random, but identical, sample of 85% of the MAS data (n=456). The remaining 15% of the data (n=80) were used to validate the two approaches. When compared to the development data, the ANN predictions had a much higher explained variance (90.2% vs. 66.5%) and much lower RMSD (9.3N vs. 17.2N), vs. the regression model. The ANN also performed better with the independent validation data (r(2)=78.6%, RMSD=15.1) compared to the regression approach (r(2)=65.3%, RMSD=18.6N). These results suggest that ANNs provide a more accurate and robust alternative to regression approaches, and should be considered more often in biomechanics and ergonomics evaluations. PMID:26876987

  9. Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression

    Nishii, Ryuei


    In normal linear regression analysis, many model selection rules proposed from various viewpoints are available. For the information criteria AIC, FPE, $C_p$, PSS and BIC, the asymptotic distribution of the selected model and the asymptotic quadratic risk based on each criterion are explicitly obtained.

  10. Early cost estimating for road construction projects using multiple regression techniques

    Ibrahim Mahamid


    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.