WorldWideScience

Sample records for multiple regression analysis

  1. Multiple linear regression analysis

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  2. Remaining Phosphorus Estimate Through Multiple Regression Analysis

    M. E. ALVES; A. LAVORENTI

    2006-01-01

    The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.

  3. MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM

    Erika KULCSÁR

    2009-01-01

    This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on...

  4. Multiple regression analysis of cancer incidence around nuclear plant

    The results of a multiple regression analysis of cancer incidence in the vicinity of a nuclear plant are presented. No dependence on radiation factors (natural background, radioactive releases, total dose of all types of medical examinations) is established. At the same time a relationship between general cancer incidence, turmors of lungs, trashea, bronchi and hematopoictic tissue carcimona incidence and releases of dangerous chemical substances is revealed

  5. MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM

    Erika KULCSÁR

    2009-12-01

    Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.

  6. The analysis of the correlation between GDP, private and public consumption through multiple regression

    Constantin ANGHELACHE; Alexandru MANOLE; Madalina Gabriela ANGHEL

    2015-01-01

    The analysis of the correlation between indicators, through multiple regression, completes the information and conclusions drawn through the application of some simple regression models. Supplementary elements achieved by using multiple regression form an additional informational support for decision makers and analysts. This paper describes a correlation between the GDP, private and public consumption, through a multiple regression model. The model explains the influence of the two types of ...

  7. Multiple linear regression.

    Eberly, Lynn E

    2007-01-01

    This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050

  8. Business applications of multiple regression

    Richardson, Ronny

    2015-01-01

    This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta

  9. Multiple regression analysis of the net income and consumption expenditure of Chinese rural households during 2007

    Da, Wa; Xiao, Hong; Zhuo, Ma

    2009-01-01

    We use the regression analysis method of multivariate statistical analysis to establish a multiple linear regression model about the net income and consumption expenditure of Chinese rural households during the year 2007. This paper analyzes the internal relation between the net income and consumption expenditure of Chinese rural households according to the regression result. Some reasonable suggestions are put forward for raising the income of rural households and stimulating consumption.

  10. An improved multiple linear regression and data analysis computer program package

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  11. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis

    BUDIMAN; ENDANG ARISOESILANINGSIH

    2012-01-01

    Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestrie...

  12. Quantitative electron microscope autoradiography: application of multiple linear regression analysis

    A new method for the analysis of high resolution EM autoradiographs is described. It identifies labelled cell organelle profiles in sections on a strictly statistical basis and provides accurate estimates for their radioactivity without the need to make any assumptions about their size, shape and spatial arrangement. (author)

  13. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    This paper describes the net peak counts calculating of nuclide 137Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  14. Noninvasive spectral imaging of skin chromophores based on multiple regression analysis aided by Monte Carlo simulation

    Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa

    2011-08-01

    In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.

  15. A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

    A technique for accurate background subtraction in 99Tcm-DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)

  16. MULTIPLE LINEAR REGRESSION ANALYSIS OF GRAND MAL AND PETIT MAL FORMS OF EXPERIMENTAL EPILEPSY

    Godlevsky, L.; Lobasyuk, B.; Kobolev, E.; Luijtelaar, E.L.J.M. van; COENEN A.R.M.L.; Stepanenko, K.; Haghoel, Raz; Prybalovetz, T.

    2005-01-01

    Relationships between amplitude of penicillin-induced generalized epileptiform signals in different zones of the brain cortex (occipito-frontal bilateral leads, as well as occipital and frontal bipolar leads) were investigated in Wistar rats using multiple linear regression method of analysis. Results were expressed in the form of policycle multigrafs (multidimensional presentation) with the identification of significant (p

  17. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  18. Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources

    O’Brien, Liam M.; FITZMAURICE, GARRETT M.

    2005-01-01

    We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which...

  19. Multiple regression approach to mapping of quantitative trait loci (QTL) based on sib-pair data: a theoretical analysis

    Xiong, Momiao; Guo, Sunwei

    2000-01-01

    The interval mapping method has been shown to be a powerful tool for mapping QTL. However, it is still a challenge to perform a simultaneous analysis of several linked QTLs, and to isolate multiple linked QTLs. To circumvent these problems, multiple regression analysis has been suggested for experimental species. In this paper, the multiple regression approach is extended to human sib-pair data through multiple regression of the squared difference in trait values between two...

  20. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    2011-01-01

    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three uniqu...

  1. Fungible Weights in Multiple Regression

    Waller, Niels G.

    2008-01-01

    Every set of alternate weights (i.e., nonleast squares weights) in a multiple regression analysis with three or more predictors is associated with an infinite class of weights. All members of a given class can be deemed "fungible" because they yield identical "SSE" (sum of squared errors) and R[superscript 2] values. Equations for generating…

  2. Multiple Regression Analysis of Aroma Components and Sensory Evaluation of Miso

    Sugawara, Etsuko; SAIGA, Suguru; Kobayashi, Akio

    1994-01-01

    Among several sensory characteristics to evaluate the quality of miso (fermented bean paste), aroma is the most difficult one. If results of chemical analysis of miso aroma could be transformed into numerical terms, the evaluation of miso may become easier. Therefore we investigated relationship between aroma components and sensory scores of rice-miso by multiple regression analysis. Thirty-four rice-miso exhibited at the National Miso Competition were used as the samples. Each peak area of t...

  3. Multiple regression analysis of Jominy hardenability data for boron treated steels

    The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

  4. The Use of Rank Transformation and Multiple Regression Analysis in Estimating Residential Property Values With A Small Sample

    Timothy P. Cronan; Donald R. Epley; Larry G. Perry

    1986-01-01

    Conventional multiple regression analysis which has been used in estimating residential property values typically relies upon cardinal data. This paper argues that appraisal theory requires the appraiser to rank the comparables from best to worst and use a regression technique which can be applied to ordinal data. The rank regression procedure illustrated here was successfully used on small sample sizes, and did not violate the critical assumptions underlying conventional multiple regression....

  5. FORECASTING THE FINANCIAL RETURNS FOR USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS

    Nop Sopipan

    2013-01-01

    Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.

  6. Inferring Preferences in Multiple Criteria Decision Analysis Using a Logistic Regression Model

    Theodor J Stewart

    1984-01-01

    A method is proposed for the analysis of multiple criteria decision making problems in an interactive environment, when decision-maker preferences are inconsistent with a simple utility model and/or are self-inconsistent (e.g., showing intransitivities). A maximum likelihood estimation procedure is invoked which is based on a logistic regression model relating the probability of selecting one decision option over another to a linear function of attribute values. The method is illustrated by a...

  7. Respiratory infections and their influence on lung function in children: a multiple regression analysis.

    Yarnell, J W; St Leger, A S

    1981-01-01

    The relationship between a history of respiratory infections (and associated variables) in children and lung function in later life was examined in a study among 228 children aged 7 to 11 years. In a multiple regression analysis only a few variables showed marked and consistent effects on lung function. Respiratory tract infections showed increasing impairment of lung function with repeated infections, but the impairment was smaller than that caused by current asthma.

  8. On Testing the Significance of the Coefficients in the Multiple Regression Analysis

    Kończak, Grzegorz

    2012-01-01

    The multiple regression analysis is a statistical tool for the investigation relationships between the dependent and independent variables. There are some procedures for selecting a subset of given predictors. These procedures are widely available in statistical computer packages. The most often used are forward selection, backward selection and stepwise selection. In these procedures testing the significance of parameters is used. If some assumptions such as normality errors a...

  9. A COMPARISON OF STEPWISE AND FUZZY MULTIPLE REGRESSION ANALYSIS TECHNIQUES FOR MANAGING SOFTWARE PROJECT RISKS: ANALYSIS PHASE

    Abdelrafe Elzamly; Burairah Hussin

    2014-01-01

    Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation proc...

  10. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  11. Assessing Credit Default using Logistic Regression and Multiple Discriminant Analysis: Empirical Evidence from Bosnia and Herzegovina

    Deni Memić

    2015-01-01

    Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.

  12. QSPR study of molar diamagnetic susceptibility of diverse organic compounds using multiple linear regression analysis

    *S. Saaidpour; S. A. Zarei; F. Nasri

    2012-01-01

    The multiple linear regression (MLR) was used to build the linear quantitative structure-property relationship (QSPR) model for the prediction of the molar diamagnetic susceptibility (χm) for 140 diverse organic compounds using the three significant descriptors calculated from the molecular structures alone and selected by stepwise regression method. Stepwise regression was employed to develop a regression equation based on 100 training compounds, and predictive ability was tested on 40 compo...

  13. Estimate of Compressive Strength for Concrete using Ultrasonics by Multiple Regression Analysis Method

    Various types of ultrasonic techniques have been used for the estimation of compressive strength of concrete structures. However, conventional ultrasonic velocity method using only longitudial wave cannot be determined the compressive strength of concrete structures with accuracy. In this paper, by using the introduction of multiple parameter, e. g. velocity of shear wave, velocity of longitudinal wave, attenuation coefficient of shear wave, attenuation coefficient of longitudinal wave, combination condition, age and preservation method, multiple regression analysis method was applied to the determination of compressive strength of concrete structures. The experimental results show that velocity of shear wave can be estimated compressive strength of concrete with more accuracy compared with the velocity of longitudinal wave, accuracy of estimated error range of compressive strength of concrete structures can be enhanced within the range of ± 10% approximately

  14. Regression Analysis

    Freund, Rudolf J; Sa, Ping

    2006-01-01

    The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design

  15. Aspects Regarding the Multiple Regression Used in Macro-economic Analysis

    Constantin ANGHELACHE; Alexandru MANOLE; Ligia PRODAN; Andreea Gabriela BALTAC; Zoica DINCA (NICOLA)

    2014-01-01

    The regression function serves as a basis for carrying out the numerous analyzes micro or macroeconomic indicators. Information obtained by use of the model simple linear regression are not always sufficient to characterize changes in an economic phenomenon and, in particular, to identify possible future evolution of the latter. To remedy these shortcomings, in the literature had been entered into multiple regression models in which the dependent variable is defined on the basis of two or mor...

  16. ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL

    Constantin Anghelache; Ioan Partachi

    2011-01-01

    The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.

  17. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  18. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    Taneja, Abhishek

    2011-01-01

    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...

  19. On-line contextual influences during reading normal text: a multiple-regression analysis.

    Pynte, Joel; New, Boris; Kennedy, Alan

    2008-09-01

    On-line contextual influences during reading were examined in a series of multiple-regression analyses conducted on a large-scale corpus of eye-movement data, using Latent Semantic Analysis (LSA) to assess the degree of contextual constraints exerted on a given target word by the immediately prior word and by the prior sentence fragment. A decrease in inspection time was observed as contextual constraints increased. Word-level constraints exerted their influence both forward (on both single-fixation and gaze durations) and backward (on gaze duration only). An independent sentence-level effect was only visible in the forward direction, and only for gaze duration. Gaze duration was also sensitive to the depth of embedding of the target word in the syntactic structure. We conclude that both low-level and high-level contextual constraints can translate in the eye-movement record. PMID:18701125

  20. Empirical predictive models of daily relativistic electron flux at geostationary orbit: Multiple regression analysis

    Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; Reeves, Geoffrey D.; Clilverd, Mark

    2016-04-01

    The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the prediction of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). A path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current (Dst), AE, and wave activity.

  1. Thermodynamic analysis of simple gas turbine cycle with multiple regression modelling and optimization

    In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature), PR (Pressure Ratio) and TIT (Turbine Inlet Temperature) on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic) with the predictor variables (operating parameters). The regression model equations showed a significant statistical relationship between the predictor and response variables. (author)

  2. Thermodynamic Analysis of Simple Gas Turbine Cycle with Multiple Regression Modelling and Optimization

    Abdul Ghafoor Memon

    2014-03-01

    Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.

  3. A COMPARISON OF STEPWISE AND FUZZY MULTIPLE REGRESSION ANALYSIS TECHNIQUES FOR MANAGING SOFTWARE PROJECT RISKS: ANALYSIS PHASE

    Abdelrafe Elzamly

    2014-01-01

    Full Text Available Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation procedures such as MMRE and Pred (25 is used to compare the accuracy of techniques. The model’s accuracy slightly improves in stepwise multiple regression rather than fuzzy multiple regression. This study will guide software managers to apply software risk management practices with real world software development organizations and verify the effectiveness of the new techniques and approaches on a software project. The study has been conducted on a group of software project using survey questionnaire. It is hope that this will enable software managers improve their decision to increase the probability of software project success.

  4. PUMA: a unified framework for penalized multiple regression analysis of GWAS data.

    Gabriel E Hoffman

    Full Text Available Penalized Multiple Regression (PMR can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM algorithm for generalized linear models (GLM combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP, as well as a penalty that has not been previously applied to GWAS (i.e. LOG. Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn

  5. Multiple Regressive Model Adaptive Control

    Garipov, Emil; Stoilkov, Teodor; Kalaykov, Ivan

    2008-01-01

    The essence of the ideas applied to this text consists in the development of the strategy for control of the arbitrary in complexity continuous plant by means of a set of discrete timeinvariant linear controllers. Their number and tuned parameters correspond to the number and parameters of the linear time-invariant regressive models in the model bank, which approximate the complex plant dynamics in different operating points. Described strategy is known as Multiple Regressive Model Adaptive C...

  6. Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

    Williams, Ryan

    2013-01-01

    The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…

  7. Influence of plant root morphology and tissue composition on phenanthrene uptake: Stepwise multiple linear regression analysis

    Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake

  8. Anomalous particle pinch and scaling of vin/D based on transport analysis and multiple regression

    Becker, G.; Kardaun, O.

    2007-01-01

    Predictions of density profiles in current tokamaks and ITER require a validated scaling relation for vin/D where vin is the anomalous inward drift velocity and D is the anomalous diffusion coefficient. Transport analysis is necessary for determining the anomalous particle pinch from measured density profiles and for separating the impact of particle sources. A set of discharges in ASDEX Upgrade, DIII-D, JET and ASDEX is analysed using a special version of the 1.5-D BALDUR transport code. Profiles of ρsvin/D with ρs the effective separatrix radius, five other dimensionless parameters and many further quantities in the confinement zone are compiled, resulting in the dataset VIND1.dat, which covers a wide parameter range. Weighted multiple regression is applied to the ASDEX Upgrade subset which leads to a two-term scaling \\rho _sv_in ({x'}) /D ({x'}) =0.0432 [ { ({L_{T_{\\rme}} ({ \\bar {x}'}) / \\rho _s}) ^{-2.58}+7.13 \\, U_L^{1.55} \

  9. Multiple correlation and regression analysis of relation between amplitude and spectral characteristics of proton events and microwave burst parameters

    The results of studying the interconnection of parameters of solar cosmic ray (SCR) events and microwave (μ) bursts obtained by the methods of multiple statistic analysis, are presented. It is shown using multiple correlation and regression analysis that the main peculiarities of the connection between μ-bursts and SCR events can be understood when accounting the differences in the dynamics of electrons and protons in different size flare arcs, supposing no SCR particle acceleration in the second flare phase

  10. Investigations upon the indefinite rolls quality assurance in multiple regression analysis

    The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks) and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers collectives of the

  11. Linear regression analysis

    Kılıç, Selim

    2013-01-01

    Linear regression is an approach to modeling the association between a numeric dependent variable y and one or more independent variables denoted X. The case of one explanatory variable in regression model is called simple linear regression. For more than one explanatory variable, then the model is called multiple linear regression. The dependent variable should be a numeric variable in linear regression. It is recommended at least 10 times as many cases as the number of independent variables...

  12. Physical and Cognitive-Affective Factors Associated with Fatigue in Individuals with Fibromyalgia: A Multiple Regression Analysis

    Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong

    2015-01-01

    Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…

  13. Multiple linear regression analysis of bacterial deposition to polyurethane coatings after conditioning film formation in the marine environment

    Bakker, D.P.; Busscher, H.J.; Zanten, J. van; Vries, J. de; Klijnstra, J.W.; Mei, H.C. van der

    2004-01-01

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh

  14. Multiple linear regression analysis of bacterial deposition to polyurethane coating after conditioning film formation in the marine environment

    Bakker, Dewi P; Busscher, Henk J; van Zanten, Joyce; de Vries, Jacob; Klijnstra, Job W; van der Mei, Henny C

    2004-01-01

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh

  15. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  16. Regression analysis by example

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  17. Understanding logistic regression analysis

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  18. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Ani Shabri; Ruhaidah Samsudin

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing...

  19. A calibration method of Argo floats based on multiple regression analysis

    2006-01-01

    Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.

  20. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  1. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

    2014-01-01

    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopt...

  2. Quantifying components of the hydrologic cycle in Virginia using chemical hydrograph separation and multiple regression analysis

    Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.

    2012-01-01

    This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.

  3. Data of multiple regressions analysis between selected biomarkers related to glutamate excitotoxicity and oxidative stress in Saudi autistic patients.

    El-Ansary, Afaf

    2016-06-01

    This work demonstrates data of multiple regression analysis between nine biomarkers related to glutamate excitotoxicity and impaired detoxification as two mechanisms recently recorded as autism phenotypes. The presented data was obtained by measuring a panel of markers in 20 autistic patients aged 3-15 years and 20 age and gender matching healthy controls. Levels of GSH, glutathione status (GSH/GSSG), glutathione reductase (GR), glutathione-s-transferase (GST), thioredoxin (Trx), thioredoxin reductase (TrxR) and peroxidoxins (Prxs I and III), glutamate, glutamine, glutamate/glutamine ratio glutamate dehydrogenase (GDH) in plasma and mercury (Hg) in red blood cells were determined in both groups. In Multiple regression analysis, R (2) values which describe the proportion or percentage of variance in the dependent variable attributed to the variance in the independent variables together were calculated. Moreover, β coefficients values which show the direction either positive or negative and the contribution of the independent variable relative to the other independent variables in explaining the variation of the dependent variable were determined. A panel of inter-related markers was recorded. This paper contains data related to and supporting research articles currently published entitled "Mechanism of nitrogen metabolism-related parameters and enzyme activities in the pathophysiology of autism" [1], "Novel metabolic biomarkers related to sulfur-dependent detoxification pathways in autistic patients of Saudi Arabia [2], and "A key role for an impaired detoxification mechanism in the etiology and severity of autism spectrum disorders" [3]. PMID:26933667

  4. Violence against Chinese female sex workers from their stable partners: a hierarchical multiple regression analysis.

    Zhang, Chen; Li, Xiaoming; Su, Shaobing; Hong, Yan; Zhou, Yuejiao; Tang, Zhenzhu; Shen, Zhiyong

    2015-01-01

    Limited data are available regarding risk factors that are related to intimate partner violence (IPV) against female sex workers (FSWs) in the context of stable partnerships. Out of the 1,022 FSWs, 743 reported ever having a stable partnership and 430 (more than half) of those reported experiencing IPV. Hierarchical multivariate regression revealed that some characteristics of stable partners (e.g., low education, alcohol use) and relationship stressors (e.g., frequent friction, concurrent partnerships) were independently predictive of IPV against FSWs. Public health professionals who design future violence prevention interventions targeting FSWs need to consider the influence of their stable partners. PMID:24730642

  5. Correlation Weights in Multiple Regression

    Waller, Niels G.; Jones, Jeff A.

    2010-01-01

    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  6. Interpretation of Regressions with Multiple Proxies

    Darren Lubotsky; Martin Wittenberg

    2001-01-01

    We consider the situation in which there are multiple proxies for one unobserved explanatory variable in a linear regression and provide a procedure by which the coefficient of interest can be extracted "post hoc" from a multiple regression in which all the proxies are used simultaneously. This post hoc estimator is strictly superior in large samples to coefficients derived using any index or linear combination of the proxies that is created prior to the regression. To use an index created fr...

  7. Bayesian logistic regression analysis

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  8. Comparison of a neural network with multiple linear regression for quantitative analysis in ICP-atomic emission spectroscopy

    A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S+ of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)

  9. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  10. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis.

    Ingunn Fride Tvete

    Full Text Available Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores. The ranking of the drugs when given without DMARD was certolizumab (ranked highest, etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest, tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment and adalimumab/ etanercept (combined with DMARD treatment the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.

  11. Multiple Linear Regression Models in Outlier Detection

    S.M.A.Khaleelur Rahman

    2012-02-01

    Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.

  12. Multiple Imputations for LInear Regression Models

    Brownstone, David

    1991-01-01

    Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubin’s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency  multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welsh’s theorems to give conditions where multiple imputations yield consistent inferences for bo...

  13. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Ani Shabri

    2014-01-01

    Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  14. Relationships between each part of the spinal curves and upright posture using Multiple stepwise linear regression analysis.

    Boulet, Sebastien; Boudot, Elsa; Houel, Nicolas

    2016-05-01

    Back pain is a common reason for consultation in primary healthcare clinical practice, and has effects on daily activities and posture. Relationships between the whole spine and upright posture, however, remain unknown. The aim of this study was to identify the relationship between each spinal curve and centre of pressure position as well as velocity for healthy subjects. Twenty-one male subjects performed quiet stance in natural position. Each upright posture was then recorded using an optoelectronics system (Vicon Nexus) synchronized with two force plates. At each moment, polynomial interpolations of markers attached on the spine segment were used to compute cervical lordosis, thoracic kyphosis and lumbar lordosis angle curves. Mean of centre of pressure position and velocity was then computed. Multiple stepwise linear regression analysis showed that the position and velocity of centre of pressure associated with each part of the spinal curves were defined as best predictors of the lumbar lordosis angle (R(2)=0.45; p=1.65*10-10) and the thoracic kyphosis angle (R(2)=0.54; p=4.89*10-13) of healthy subjects in quiet stance. This study showed the relationships between each of cervical, thoracic, lumbar curvatures, and centre of pressure's fluctuation during free quiet standing using non-invasive full spinal curve exploration. PMID:26970888

  15. Computing multiple-output regression quantile regions

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Roč. 56, č. 4 (2012), s. 840-853. ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf

  16. The Geometry of Enhancement in Multiple Regression

    Waller, Niels G.

    2011-01-01

    In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and enhancement cannot…

  17. Health Expenditures in Greece: A Multiple Least Squares Regression and Cointegration Analysis Using Bootstrap Simulation in EVIEWS

    Giovanis, Eleftherios

    2009-01-01

    This paper examines the factors that are contributing at the most explained and efficient way to health expenditures in Greece. Two methods are applied. Multiple regressions and vector error correction models are estimated, as also unit root tests applied to define in which order variables are stationary. Because the available data are yearly and capture a small period from 1985-2006, so the sample is small, a bootstrap simulation is applied, to improve the estimations.

  18. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis

    Greensmith, David J.

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of...

  19. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

    Denli, H. H.; Koc, Z.

    2015-12-01

    Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

  20. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    Kokaly, R.F.; Clark, R.N.

    1999-01-01

    We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using

  1. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis

    Yoo, Yun Joo; Sun, Lei; Shelley B Bull

    2013-01-01

    Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SN...

  2. Prediction of the processing factor for pesticides in apple juice by principal component analysis and multiple linear regression.

    Martin, L; Mezcua, M; Ferrer, C; Gil Garcia, M D; Malato, O; Fernandez-Alba, A R

    2013-01-01

    The main objective of this work was to establish a mathematical function that correlates pesticide residue levels in apple juice with the levels of the pesticides applied on the raw fruit, taking into account some of their physicochemical properties such as water solubility, the octanol/water partition coefficient, the organic carbon partition coefficient, vapour pressure and density. A mixture of 12 pesticides was applied to an apple tree; apples were collected after 10 days of application. After harvest, apples were treated with a mixture of three post-harvest pesticides and the fruits were then processed in order to obtain apple juice following a routine industrial process. The pesticide residue levels in the apple samples were analysed using two multi-residue methods based on LC-MS/MS and GC-MS/MS. The concentration of pesticides was determined in samples derived from the different steps of processing. The processing factors (the coefficient between residue level in the processed commodity and the residue level in the commodity to be processed) obtained for the full juicing process were found to vary among the different pesticides studied. In order to investigate the relationships between the levels of pesticide residue found in apple juice samples and their physicochemical properties, principal component analysis (PCA) was performed using two sets of samples (one of them using experimental data obtained in this work and the other including the data taken from the literature). In both cases the correlation was found between processing factors of pesticides in the apple juice and the negative logarithms (base 10) of the water solubility, octanol/water partition coefficient and organic carbon partition coefficient. The linear correlation between these physicochemical properties and the processing factor were established using a multiple linear regression technique. PMID:23281800

  3. Regression analysis in quantum language

    ISHIKAWA, Shiro

    2014-01-01

    Although regression analysis has a great history, we consider that it has always continued being confused. For example, the fundamental terms in regression analysis (e.g., "regression", "least-squares method", "explanatory variable", "response variable", etc.) seem to be historically conventional, that is, these words do not express the essence of regression analysis. Recently, we proposed quantum language (or, classical and quantum measurement theory), which is characterized as the linguisti...

  4. On directional multiple-output quantile regression

    Paindaveine, D.; Šiman, Miroslav

    2011-01-01

    Roč. 102, č. 2 (2011), s. 193-212. ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant ostatní: Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf

  5. Regression Analysis A Constructive Critique

    Berk, Richard A

    2003-01-01

    Regression Analysis: A Constructive Critique identifies a wide variety of problems with regression analysis as it is commonly used and then provides a number of ways in which practice could be improved. Regression is most useful for data reduction, leading to relatively simple but rich and precise descriptions of patterns in a data set. The emphasis on description provides readers with an insightful rethinking from the ground up of what regression analysis can do, so that readers can better match regression analysis with useful empirical questions and improved policy-related research. "An

  6. Hierarchical regression for analyses of multiple outcomes.

    Richardson, David B; Hamra, Ghassan B; MacLehose, Richard F; Cole, Stephen R; Chu, Haitao

    2015-09-01

    In cohort mortality studies, there often is interest in associations between an exposure of primary interest and mortality due to a range of different causes. A standard approach to such analyses involves fitting a separate regression model for each type of outcome. However, the statistical precision of some estimated associations may be poor because of sparse data. In this paper, we describe a hierarchical regression model for estimation of parameters describing outcome-specific relative rate functions and associated credible intervals. The proposed model uses background stratification to provide flexible control for the outcome-specific associations of potential confounders, and it employs a hierarchical "shrinkage" approach to stabilize estimates of an exposure's associations with mortality due to different causes of death. The approach is illustrated in analyses of cancer mortality in 2 cohorts: a cohort of dioxin-exposed US chemical workers and a cohort of radiation-exposed Japanese atomic bomb survivors. Compared with standard regression estimates of associations, hierarchical regression yielded estimates with improved precision that tended to have less extreme values. The hierarchical regression approach also allowed the fitting of models with effect-measure modification. The proposed hierarchical approach can yield estimates of association that are more precise than conventional estimates when one wishes to estimate associations with multiple outcomes. PMID:26232395

  7. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  8. Multiple Regression Analyses in Clinical Child and Adolescent Psychology

    Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida

    2006-01-01

    A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation,…

  9. A Study of Personality and Family- and School Environment and Possible Interactional Effects in 244 Swedish Children—A Multiple Regression Analysis

    Persson, Bertil

    2014-01-01

    The aim of the study was to examine relationships between psychosocial family- and school environment and personality as assessed by the Junior Eysenck Personality Questionnaire (EPQ-J) and possible personality interactional effects. The study was based on 244 Swedish girls and boys, 10-19 years old, who filled in the Family- and School Psychosocial Environment (FSPE) questionnaire and the EPQ-J. A multiple regression analysis showed that the FSPE-factor Family conflicts and school discipline...

  10. Multiple linear regression for isotopic measurements

    Garcia Alonso, J. I.

    2012-04-01

    There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.

  11. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice; Romary, Laurent

    2009-01-01

    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression m...

  12. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui

    2016-03-01

    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  13. A comparative study of multiple regression analysis and back propagation neural network approaches on plain carbon steel in submerged-arc welding

    ABHIJIT SARKAR; PRASENJIT DEY; R N RAI; SUBHAS CHANDRA SAHA

    2016-05-01

    Weld bead plays an important role in determining the quality of welding particularly in high heat input processes. This research paper presents the development of multiple regression analysis (MRA) and artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arcwelding process. Design of experiments is based on Taguchi’s L16 orthogonal array by varying wire feed rate,transverse speed and stick out to develop a multiple regression model, which has been checked for adequacy andsignificance. Also, ANN model was accomplished with the back propagation approach in MATLAB program to predict bead geometry and HAZ width. Finally, the results of two prediction models were compared and analyzed. It is found that the error related to the prediction of bead geometry and HAZ width is smaller in ANN than MRA.

  14. Improving Regional Dynamic Downscaling with Multiple Linear Regression Model Using Components Principal Analysis: Precipitation over Amazon and Northeast Brazil

    Aline Gomes da Silva

    2014-01-01

    Full Text Available In the current context of climate change discussions, predictions of future scenarios of weather and climate are crucial for the generation of information of interest to the global community. Due to the atmosphere being a chaotic system, errors in predictions of future scenarios are systematically observed. Therefore, numerous techniques have been tested in order to generate more reliable predictions, and two techniques have excelled in science: dynamic downscaling, through regional models, and ensemble prediction, combining different outputs of climate models through the arithmetic average, in other words, a postprocessing of the output data species. Thus, this paper proposes a method of postprocessing outputs of regional climate models. This method consists in using the statistical tool multiple linear regression by principal components for combining different simulations obtained by dynamic downscaling with the regional climate model (RegCM4. Tests for the Amazon and Northeast region of Brazil (South America showed that the method provided a more realistic prediction in terms of average daily rainfall for the analyzed period prescribed, after comparing with the prediction made by set through the arithmetic averages of the simulations. This method photographed the extreme events (outlier that the prediction by averaging failed. Data from the Tropical Rainfall Measuring Mission (TRMM were used to evaluate the method.

  15. A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

    Hukharnsusatrue, A.

    2005-11-01

    Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than

  16. Interpretation of Standardized Regression Coefficients in Multiple Regression.

    Thayer, Jerome D.

    The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…

  17. Regression Computer Programs for Setwise Regression and Three Related Analysis of Variance Techniques.

    Williams, John D.; Lindem, Alfred C.

    Four computer programs using the general purpose multiple linear regression program have been developed. Setwise regression analysis is a stepwise procedure for sets of variables; there will be as many steps as there are sets. Covarmlt allows a solution to the analysis of covariance design with multiple covariates. A third program has three…

  18. Estimating the input function non-invasively for FDG-PET quantification with multiple linear regression analysis: simulation and verification with in vivo data

    A novel statistical method, namely Regression-Estimated Input Function (REIF), is proposed in this study for the purpose of non-invasive estimation of the input function for fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) quantitative analysis. We collected 44 patients who had undergone a blood sampling procedure during their FDG-PET scans. First, we generated tissue time-activity curves of the grey matter and the whole brain with a segmentation technique for every subject. Summations of different intervals of these two curves were used as a feature vector, which also included the net injection dose. Multiple linear regression analysis was then applied to find the correlation between the input function and the feature vector. After a simulation study with in vivo data, the data of 29 patients were applied to calculate the regression coefficients, which were then used to estimate the input functions of the other 15 subjects. Comparing the estimated input functions with the corresponding real input functions, the averaged error percentages of the area under the curve and the cerebral metabolic rate of glucose (CMRGlc) were 12.13±8.85 and 16.60±9.61, respectively. Regression analysis of the CMRGlc values derived from the real and estimated input functions revealed a high correlation (r=0.91). No significant difference was found between the real CMRGlc and that derived from our regression-estimated input function (Student's t test, P>0.05). The proposed REIF method demonstrated good abilities for input function and CMRGlc estimation, and represents a reliable replacement for the blood sampling procedures in FDG-PET quantification. (orig.)

  19. Regression, Discriminant Analysis, and Canonical Correlation Analysis with Homals

    Jan de Leeuw

    2009-01-01

    It is shown that the homals package in R can be used for multiple regression, multi-group discriminant analysis, and canonical correlation analysis. The homals solutions are only different from the more conventional ones in the way the dimensions are scaled by the eigenvalues.It is shown that the homals package in R can be used for multiple regression, multi-group discriminant analysis, and canonical correlation analysis. The homals solutions are only different from the more conventional ones...

  20. Entrepreneurial intention modeling using hierarchical multiple regression

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  1. Heteroscedastic regression analysis method for mixed data

    FU Hui-min; YUE Xiao-rui

    2011-01-01

    The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.

  2. Pricing Single Malt Whisky : A Regression Analysis

    Bjartmar Hylta, Sanna; Lundquist, Emma

    2016-01-01

    This thesis examines the factors that affect the price of whisky. Multiple regression analysis is used to model the relationship between the identified covariates that are believed to impact the price of whisky. The optimal marketing strategy for whisky producers in the regions Islay and Campbeltown are discussed. This analysis is based on the Marketing Mix. Furthermore, a Porter’s five forces analysis, focusing on the regions Campeltown and Islay, is examined. Finally the findings are summar...

  3. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice

    2009-01-01

    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  4. Dimension Reduction of the Explanatory Variables in Multiple Linear Regression

    Filzmoser, P.; Croux, Christophe

    2003-01-01

    Abstract: In classical multiple linear regression analysis problems will occur if the regressors are either multicollinear or if the number of regressors is larger than the number of observations. In this note a new method is introduced which constructs orthogonal predictor variables in a way to have a maximal correlation with the dependent variable. The predictor variables are linear combinations of the original regressors. This method allows a major reduction of the number of predictors ...

  5. Shrinkage Estimation and Selection for Multiple Functional Regression

    LIAN, HENG

    2011-01-01

    Functional linear regression is a useful extension of simple linear regression and has been investigated by many researchers. However, functional variable selection problems when multiple functional observations exist, which is the counterpart in the functional context of multiple linear regression, is seldom studied. Here we propose a method using group smoothly clipped absolute deviation penalty (gSCAD) which can perform regression estimation and variable selection simultaneously. We show t...

  6. Multiple Linear Regression Model Used in Economic Analyses

    Constantin ANGHELACHE; Madalina Gabriela ANGHEL; Ligia PRODAN; Cristina SACALA; Marius POPOVICI

    2014-01-01

    The multiple regression is a tool that offers the possibility to analyze the correlations between more than two variables, situation which account for most cases in macro-economic studies. The best known method of estimation for multiple regression is the method of least squares. As in the two-variable regression, we choose the regression function of sample and minimize the sum of squared residual values. Another method that allows us to take into account the number of variables factor when d...

  7. Study relationship between inorganic and organic coal analysis with gross calorific value by multiple regression and ANFIS

    Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.

    2011-01-01

    The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.

  8. Fuzzy multiple linear regression: A computational approach

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  9. Credit Scoring Problem Based on Regression Analysis

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  10. Quantifying TiO2 Abundance of Lunar Soils:Partial Least Squares and Stepwise Multiple Regression Analysis for Determining Causal Effect

    Lin Li

    2011-01-01

    Partial least squares (PLS) regression was applied to the Lunar Soil Characterization Consortium (LSCC) dataset for spectral estimation of TiO2.The LSCC dataset was split into a number of subsets including the low-Ti,high-Ti,total mare soils,total highland,Apollo 16,and Apollo 14 soils to investigete the effects of interfering minerals and nonlinearity on the PLS performance.The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance.PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together.The results suggest that while the dominant TiO2-bearing minerals are few,additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2,to accommodate nonlinear relationships between reflectance and TiO2,and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples.Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups.For the LSCC Apollo 16 samples,the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2.For the Apollo 14 soils,more accurate estimation for TiO2 is attributed to the positive correlation between a major TiO2-bearing component and TiO2,explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.

  11. A combined multiple regression-time series approach to process capability analysis when data are auto correlated

    The problem of performing process capability analysis when auto correlations are present is discussed. It is shown that when the systematic nonrandom phenomenon induced by autocorrelation is ignored the variance estimate obtained from the original data is no longer an appropriate estimate for use in the process capability analyses. A remedial measure based on an autoregressive integrated moving average model is proposed. It is also shown that the process variance estimated from the residual analysis yields appropriate results for the process capability indices

  12. Exploring the equity of GP practice prescribing rates for selected coronary heart disease drugs: a multiple regression analysis with proxies of healthcare need

    St Leger Antony S

    2005-02-01

    Full Text Available Abstract Background There is a small, but growing body of literature highlighting inequities in GP practice prescribing rates for many drug therapies. The aim of this paper is to further explore the equity of prescribing for five major CHD drug groups and to explain the amount of variation in GP practice prescribing rates that can be explained by a range of healthcare needs indicators (HCNIs. Methods The study involved a cross-sectional secondary analysis in four primary care trusts (PCTs 1–4 in the North West of England, including 132 GP practices. Prescribing rates (average daily quantities per registered patient aged over 35 years and HCNIs were developed for all GP practices. Analysis was undertaken using multiple linear regression. Results Between 22–25% of the variation in prescribing rates for statins, beta-blockers and bendrofluazide was explained in the multiple regression models. Slightly more variation was explained for ACE inhibitors (31.6% and considerably more for aspirin (51.2%. Prescribing rates were positively associated with CHD hospital diagnoses and procedures for all drug groups other than ACE inhibitors. The proportion of patients aged 55–74 years was positively related to all prescribing rates other than aspirin, where they were positively related to the proportion of patients aged >75 years. However, prescribing rates for statins and ACE inhibitors were negatively associated with the proportion of patients aged >75 years in addition to the proportion of patients from minority ethnic groups. Prescribing rates for aspirin, bendrofluazide and all CHD drugs combined were negatively associated with deprivation. Conclusion Although around 25–50% of the variation in prescribing rates was explained by HCNIs, this varied markedly between PCTs and drug groups. Prescribing rates were generally characterised by both positive and negative associations with HCNIs, suggesting possible inequities in prescribing rates on the basis

  13. Elevated-temperature, strain-controlled fatigue data on Type 304 stainless steel. A compilation, multiple linear regression model, and statistical analysis

    Diercks, D R; Raske, D T

    1976-12-01

    The available elevated-temperature, strain-controlled, uniaxial fatigue data on Type 304 stainless steel (474 data points) are tabulated, and variables that influence cyclic life are divided into first- and second-order categories. The first-order variables, which include strain range, strain rate, temperature, and hold time, were used in a multiple linear regression analysis to describe the observed variation in fatigue life for zero and tension hold-time data. Goodness of fit, with respect to these variables, as well as the appropriateness of the transformations used are discussed. Prediction intervals are estimated, and comparisons between the regression equation curves and the data from which they were obtained are made. The second-order variables include the laboratories at which the data were generated, the different heats from which the test specimens were fabricated, and the heat treatments that preceded testing. These variables were statistically analyzed to determine their effect on fatigue life. The results are discussed, and the heats and heat treatments that are most resistant to fatigue damage under these loading and environmental conditions are identified.

  14. An Additive-Multiplicative Cox-Aalen Regression Model

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  15. Prognostic factors in patients with cervix cancer treated by radiation therapy: results of a multiple regression analysis

    A retrospective analysis of 965 patients with invasive cervix cancer treated by radiation therapy between 1976 and 1981 was performed in order to evaluate prognostic factors for disease-free survival (DFS) and pelvic control. FIGO stage was the most powerful prognostic factor followed by radiation dose and treatment duration (P values = 0.0001). If the analysis was limited to patients treated with radical doses of 75 Gy or more, dose was no longer significant. Young age at diagnosis, non-squamous histology and transfusion during treatment were also adverse prognostic factors for survival and control. Para-aortic nodal involvement on lymphogram was associated with a reduction in DFS (P = 0.0027), whereas pelvic lymph node involvement alone was not. In patients with Stage I and IIA disease, tumour size was the most powerful prognostic factor for survival (P = 0.0001) and the extent of pelvic sidewall involvement was significant in patients with Stage III tumours (P = 0.007). Histological grade appeared to be a predictive factor but was only recorded in 712 patients. These features should be considered in the staging of patients and in the design of clinical trials

  16. Retail sales forecasting with application the multiple regression

    Kuzhda, Tetyana

    2012-05-01

    Full Text Available The article begins with a formulation for predictive learning called multiple regression model. Theoretical approach on construction of the regression models is described. The key information of the article is the mathematical formulation for the forecast linear equation that estimates the multiple regression model. Calculation the quantitative value of dependent variable forecast under influence of independent variables is explained. This paper presents the retail sales forecasting with multiple model estimation. One of the most important decisions a retailer can make with information obtained by the multiple regression. Recently, a changing retail environment is causing by an expected consumer’s income and advertising costs. Checking model on the goodness of fit and statistical significance are explored in the article. Finally, the quantitative value of retail sales forecast based on multiple regression model is calculated.

  17. The use of multiple linear regression in property valuation

    Marko Pejić

    2013-05-01

    Full Text Available The property appraisal is of great importance for one country and its economy. Nowadays, successful land management system could not be imagined without the subsystem related to market economy. Having the information about land and its values offer broad possibilities for market economy and strongly influence development of the real estate market. Special attention should be paid to the mass appraisal methods and its use in developing the tax system and framework for appropriate property appraisal system. Multiple regression analysis is just one of the methods used for this purpose and this article is focused to its characteristics and advantages in mass appraisal system development.

  18. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E

    2014-06-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. PMID:24442792

  19. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  20. Fuzzy Multiple Regression Model for Estimating Software Development Time

    Venus Marza

    2009-10-01

    Full Text Available As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy Logic model & Multiple Regression model based on their accuracy.

  1. A multiple covariance approach to PLS regression with several predictor groups: Structural Equation Exploratory Regression

    Bry, Xavier; Verron, Thomas; Cazes, Pierre

    2008-01-01

    A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in groups and investiga...

  2. Synthesis analysis of regression models with a continuous outcome

    Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin

    2009-01-01

    To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method...

  3. REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL

    Barbu Bogdan POPESCU

    2013-02-01

    Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.

  4. REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL

    Barbu Bogdan POPESCU; Lavinia Stefania TOTAN

    2013-01-01

    There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.

  5. ERP correlates of word production predictors in picture naming: a trial by trial multiple regression analysis from stimulus onset to response

    Valente, Andrea; Bürki, Audrey; Laganaro, Marina

    2014-01-01

    A major effort in cognitive neuroscience of language is to define the temporal and spatial characteristics of the core cognitive processes involved in word production. One approach consists in studying the effects of linguistic and pre-linguistic variables in picture naming tasks. So far, studies have analyzed event-related potentials (ERPs) during word production by examining one or two variables with factorial designs. Here we extended this approach by investigating simultaneously the effects of multiple theoretical relevant predictors in a picture naming task. High density EEG was recorded on 31 participants during overt naming of 100 pictures. ERPs were extracted on a trial by trial basis from picture onset to 100 ms before the onset of articulation. Mixed-effects regression models were conducted to examine which variables affected production latencies and the duration of periods of stable electrophysiological patterns (topographic maps). Results revealed an effect of a pre-linguistic variable, visual complexity, on an early period of stable electric field at scalp, from 140 to 180 ms after picture presentation, a result consistent with the proposal that this time period is associated with visual object recognition processes. Three other variables, word Age of Acquisition, Name Agreement, and Image Agreement influenced response latencies and modulated ERPs from ~380 ms to the end of the analyzed period. These results demonstrate that a topographic analysis fitted into the single trial ERPs and covering the entire processing period allows one to associate the cost generated by psycholinguistic variables to the duration of specific stable electrophysiological processes and to pinpoint the precise time-course of multiple word production predictors at once. PMID:25538546

  6. Multiple regression analyses in the prediction of aerospace instrument costs

    Tran, Linh

    The aerospace industry has been investing for decades in ways to improve its efficiency in estimating the project life cycle cost (LCC). One of the major focuses in the LCC is the cost/prediction of aerospace instruments done during the early conceptual design phase of the project. The accuracy of early cost predictions affects the project scheduling and funding, and it is often the major cause for project cost overruns. The prediction of instruments' cost is based on the statistical analysis of these independent variables: Mass (kg), Power (watts), Instrument Type, Technology Readiness Level (TRL), Destination: earth orbiting or planetary, Data rates (kbps), Number of bands, Number of channels, Design life (months), and Development duration (months). This author is proposing a cost prediction approach of aerospace instruments based on these statistical analyses: Clustering Analysis, Principle Components Analysis (PCA), Bootstrap, and multiple regressions (both linear and non-linear). In the proposed approach, the Cost Estimating Relationship (CER) will be developed for the dependent variable Instrument Cost by using a combination of multiple independent variables. "The Full Model" will be developed and executed to estimate the full set of nine variables. The SAS program, Excel, Automatic Cost Estimating Integrate Tool (ACEIT) and Minitab are the tools to aid the analysis. Through the analysis, the cost drivers will be identified which will help develop an ultimate cost estimating software tool for the Instrument Cost prediction and optimization of future missions.

  7. Gaussian process regression analysis for functional data

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  8. Multiple kernel support vector regression for pricing nifty option

    Neetu Verma

    2015-09-01

    Full Text Available The goal of present experiments is to investigate the use of multiple kernel learning as a tool for pricing options in the context of Indian stock market for Nifty index options. In this paper, fair price of an option is predicted by Multiple Kernel Support Vector Regression (MKLSVR using linear combinations of kernels and Single Kernel Support Vector Regression (SKSVR. Prices of option highly depend on different money market conditions like deep-in-the-money, in-the-money, at-the-money, out-of-money and deep-out-of-money condition. The experimental study attempts to identify the forecasting errors with the help of mean square error; root meant square error, and normalized root meant square error between the market option prices and the calculated option prices by model for all market conditions. The results reflect that multiple kernel support vector regression performed fairly well in comparison to support vector regression with single kernel.

  9. Sample Sizes when Using Multiple Linear Regression for Prediction

    Knofczynski, Gregory T.; Mundfrom, Daniel

    2008-01-01

    When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios…

  10. Vehicle Travel Time Predication based on Multiple Kernel Regression

    Wenjing Xu

    2014-07-01

    Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.

  11. On relationship between regression models and interpretation of multiple regression coefficients

    A N Varaksin; Panov, V. G.

    2012-01-01

    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no ...

  12. Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model

    Souvik Bhattacharyya

    2011-07-01

    Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is firstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classifier. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to find out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.

  13. Teasing out the effect of tutorials via multiple regression

    Chasteen, Stephanie V.

    2012-02-01

    We transformed an upper-division physics course using a variety of elements, including homework help sessions, tutorials, clicker questions with peer instruction, and explicit learning goals. Overall, the course transformations improved student learning, as measured by our conceptual assessment. Since these transformations were multi-faceted, we would like to understand the impact of individual course elements. Attendance at tutorials and homework help sessions was optional, and occurred outside the class environment. In order to identify the impact of these optional out-of-class sessions, given self-selection effects in student attendance, we performed a multiple regression analysis. Even when background variables are taken into account, tutorial attendance is positively correlated with student conceptual understanding of the material - though not with performance on course exams. Other elements that increase student time-on-task, such as homework help sessions and lectures, do not achieve the same impacts.

  14. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  15. Simultaneous confidence bands in linear regression analysis

    Ah-Kine, Pascal Soon Shien

    2010-01-01

    A simultaneous confidence band provides useful information on the plausible range of an unknown regression model. For a simple linear regression model, the most frequently quoted bands in the statistical literature include the two-segment band, the three-segment band and the hyperbolic band, and for a multiple linear regression model, the most com- mon bands in the statistical literature include the hyperbolic band and the constant width band. The optimality criteria for confid...

  16. Computing multiple-output regression quantile regions from projection quantiles

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Roč. 27, č. 1 (2012), s. 29-49. ISSN 0943-4062 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : directional quantile * halfspace depth * multiple-output regression * parametric programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 0.482, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376414.pdf

  17. Multivariate quantiles and multiple-output regression quantiles

    Hallin, Marc; Paindaveine, Davy; Siman, Miroslav

    2009-01-01

    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett s traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the cl...

  18. Local bilinear multiple-output quantile/depth regression

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Šiman, Miroslav

    2015-01-01

    A new quantile regression concept, based on a directional version of Koenker and Bassett’s traditional single-output one, has been introduced in [ Ann. Statist. (2010) 38 635–669] for multiple-output location/linear regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear (actually, bilinear) versions o...

  19. Local Constant and Local Bilinear Multiple-Output Quantile Regression

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Siman, Miroslav

    2012-01-01

    A new quantile regression concept, based on a directional version of Koenker and Bassett’s traditional single-output one, has been introduced in [Hallin, Paindaveine and ˇSiman, Annals of Statistics 2010, 635-703] for multiple-output regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear versio...

  20. Elliptical multiple-output quantile regression and convex optimization

    Hallin, M.; Šiman, Miroslav

    2016-01-01

    Roč. 109, č. 1 (2016), s. 232-237. ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.595, year: 2014 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf

  1. Analysis of Inflation in Turkey via Ridge Regression

    Duygu Tunali; Emel Şıiklar

    2015-01-01

    The aim of this study is to analyze inflation in Turkey between the years 2003-2014 and also compare the inflation for the period 2003-2014 with inflation in the years 1963-1983 in Turkey. When multiple linear regression modeling is used for inflation analysis, multicollinearity problem occurred between independent variables. In this study to eliminate the problem in concern ; ridge regression, which is one of the biased estimation methods, is used. Ridge regression method, gives smaller mean...

  2. A field operational test on valve-regulated lead-acid absorbent-glass-mat batteries in micro-hybrid electric vehicles. Part II. Results based on multiple regression analysis and tear-down analysis

    Schaeck, S.; Karspeck, T.; Ott, C.; Weirather-Koestner, D.; Stoermer, A. O.

    2011-03-01

    In the first part of this work [1] a field operational test (FOT) on micro-HEVs (hybrid electric vehicles) and conventional vehicles was introduced. Valve-regulated lead-acid (VRLA) batteries in absorbent glass mat (AGM) technology and flooded batteries were applied. The FOT data were analyzed by kernel density estimation. In this publication multiple regression analysis is applied to the same data. Square regression models without interdependencies are used. Hereby, capacity loss serves as dependent parameter and several battery-related and vehicle-related parameters as independent variables. Battery temperature is found to be the most critical parameter. It is proven that flooded batteries operated in the conventional power system (CPS) degrade faster than VRLA-AGM batteries in the micro-hybrid power system (MHPS). A smaller number of FOT batteries were applied in a vehicle-assigned test design where the test battery is repeatedly mounted in a unique test vehicle. Thus, vehicle category and specific driving profiles can be taken into account in multiple regression. Both parameters have only secondary influence on battery degradation, instead, extended vehicle rest time linked to low mileage performance is more serious. A tear-down analysis was accomplished for selected VRLA-AGM batteries operated in the MHPS. Clear indications are found that pSoC-operation with periodically fully charging the battery (refresh charging) does not result in sulphation of the negative electrode. Instead, the batteries show corrosion of the positive grids and weak adhesion of the positive active mass.

  3. Functional linear regression via canonical analysis

    He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228

    2011-01-01

    We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.

  4. Theoretical Aspects Regarding the Use of the Multiple Linear Regression Model in Economic Analyses

    Constantin ANGHELACHE; Ioan PARTACHI; Adina Mihaela DINU; Ligia PRODAN; Georgeta BARDAªU (LIXANDRU)

    2013-01-01

    In this paper we have studied the dependence between GDP, final consumption and net investments. To analyze this correlation, the article proposes a multiple regression model, extremely useful tool in economic analysis. Regression model described in the article considers the GDP as outcome variables and final consumption and net investment as factorial variables.

  5. Confidence Intervals for an Effect Size Measure in Multiple Linear Regression

    Algina, James; Keselman, H. J.; Penfield, Randall D.

    2007-01-01

    The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…

  6. Direction of Effects in Multiple Linear Regression Models.

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741

  7. Applied regression analysis a research tool

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  8. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

    2012-01-01

    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  9. On relationship between regression models and interpretation of multiple regression coefficients

    Varaksin, A N

    2012-01-01

    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no linear statistical dependence on other presented variables.

  10. Regression Analysis and the Sociological Imagination

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  11. A comparative analysis of the effects of instructional design factors on student success in e-learning: multiple-regression versus neural networks

    Halil Ibrahim Cebeci

    2009-12-01

    Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.

  12. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  13. Application of Partial Least-Squares Regression Model on Temperature Analysis and Prediction of RCCD

    Yuqing Zhao; Zhenxian Xing

    2013-01-01

    This study, based on the temperature monitoring data of jiangya RCCD, uses principle and method of partial least-squares regression to analyze and predict temperature variation of RCCD. By founding partial least-squares regression model, multiple correlations of independent variables is overcome, organic combination on multiple linear regressions, multiple linear regression and canonical correlation analysis is achieved. Compared with general least-squares regression model result, it is more ...

  14. Bayes Linear Sufficiency in Non-exchangeable Multivariate Multiple Regressions

    Wooff, D. A.

    2014-01-01

    We consider sufficiency for Bayes linear revision for multivariate multiple regression problems, and in particular where we have a sequence of multivariate observations at different matrix design points, but with common parameter vector. Such sequences are not usually exchangeable. However, we show that there is a sequence of transformed observations which is exchangeable and we demonstrate that their mean is sufficient both for Bayes linear revision of the parameter vector and for prediction...

  15. A unified framework for model-based clustering, linear regression and multiple cluster structure detection

    Galimberti, Giuliano; Manisi, Annamaria; Soffritti, Gabriele

    2015-01-01

    A general framework for dealing with both linear regression and clustering problems is described. It includes Gaussian clusterwise linear regression analysis with random covariates and cluster analysis via Gaussian mixture models with variable selection. It also admits a novel approach for detecting multiple clusterings from possibly correlated sub-vectors of variables, based on a model defined as the product of conditionally independent Gaussian mixture models. A necessary condition for the ...

  16. Multiple regression analysis to assess the role of plankton on the distribution and speciation of mercury in water of a contaminated lagoon.

    Stoichev, T; Tessier, E; Amouroux, D; Almeida, C M; Basto, M C P; Vasconcelos, V M

    2016-11-15

    Spatial and seasonal variation of mercury species aqueous concentrations and distributions was carried out during six sampling campaigns at four locations within Laranjo Bay, the most mercury-contaminated area of the Aveiro Lagoon (Portugal). Inorganic mercury (IHg(II)) and methylmercury (MeHg) were determined in filter-retained (IHgPART, MeHgPART) and filtered (<0.45μm) fractions (IHg(II)DISS, MeHgDISS). The concentrations of IHgPART depended on site and on dilution with downstream particles. Similar processes were evidenced for MeHgPART, however, its concentrations increased for particles rich in phaeophytin (Pha). The concentrations of MeHgDISS, and especially those of IHg(II)DISS, increased with Pha concentrations in the water. Multiple regression models are able to depict MeHgPART, IHg(II)DISS and MeHgDISS concentrations with salinity and Pha concentrations exhibiting additive statistical effects and allowing separation of possible addition and removal processes. A link between phytoplankton/algae and consumers' grazing pressure in the contaminated area can be involved to increase concentrations of IHg(II)DISS and MeHgPART. These processes could lead to suspended particles enriched with MeHg and to the enhancement of IHg(II) and MeHg availability in surface waters and higher transfer to the food web. PMID:27484944

  17. Prediction on adsorption ratio of carbon dioxide to methane on coals with multiple linear regression

    YU Hong-guan; MENG Xian-ming; FAN Wei-tang; YE Jian-ping

    2007-01-01

    The multiple linear regression equations for adsorption ratio of CO2/CH4 and its coal quality indexes were built with SPSS software on basis of existing coal quality data and its adsorption amount of CO2 and CH4.The regression equations built were tested with data collected from some S,and the influences of coal quality indexes on adsorption ratio of CO2/CH4 were studied with investigation of regression equations.The study results show that the regression equation for adsorption ratio of CO2/CH4 and volatile matter,ash and moisture in coal can be Obtained with multiple linear regression analysis,that the influence of same coal quality index with the degree of metamorphosis or influence of coal quality indexes for same coal rank on adsorption ratio is not consistent.

  18. 火灾与社会经济环境的多元回归分析%Multiple regression analysis on fire and socioeconomic environment

    蔡晶菁

    2012-01-01

    By mathematical application software such as SPSS, Excel, MATLAB etc. , the fire and socioeconomic environment were analyzed by scatter plot, correlation analysis, principal component analysis and regression analysis. Taking fire situation in 2009 as an example, the influence of socioeconomic environment to fire was studied, which can provide reference for the fire prevention and socioeconomic environment coordinated development.%借助SPSS、Excel、MATLAB等数学应用软件,对火灾与社会经济环境进行散点图分析、相关分析、主成分分析及回归分析,以2009年全国火灾形势为例,研究社会经济环境诸指标对火灾的影响,为更好地防范火灾、促进社会经济环境协调发展提供科学依据和决策参考.

  19. Analysis of Inflation in Turkey via Ridge Regression

    Duygu Tunalı

    2015-12-01

    Full Text Available The aim of this study is to analyze inflation in Turkey between the years 2003-2014 and also compare the inflation for the period 2003-2014 with inflation in the years 1963-1983 in Turkey. When multiple linear regression modeling is used for inflation analysis, multicollinearity problem occurred between independent variables. In this study to eliminate the problem in concern ; ridge regression, which is one of the biased estimation methods, is used. Ridge regression method, gives smaller mean square error made by the least squares method based on β parameter estimator without removing variables of the model.

  20. Poisson regression analysis of ungrouped data

    Loomis, D; Richardson, D.; Elliott, L

    2005-01-01

    Background: Poisson regression is routinely used for analysis of epidemiological data from studies of large occupational cohorts. It is typically implemented as a grouped method of data analysis in which all exposure and covariate information is categorised and person-time and events are tabulated.

  1. A Regression Analysis Model Based on Wavelet Networks

    XIONG Zheng-feng

    2002-01-01

    In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.

  2. Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis

    Kim, Rae Seon

    2011-01-01

    When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…

  3. Hot Resistance Estimation for Dry Type Transformer Using Multiple Variable Regression, Multiple Polynomial Regression and Soft Computing Techniques

    M. Srinivasan

    2012-01-01

    Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.

  4. Precipitation interpolation in mountainous regions using multiple linear regression

    Hay, L.; Viger, R.; McCabe, G.

    1998-01-01

    Multiple linear regression (MLR) was used to spatially interpolate precipitation for simulating runoff in the Animas River basin of southwestern Colorado. MLR equations were defined for each time step using measured precipitation as dependent variables. Explanatory variables used in each MLR were derived for the dependent variable locations from a digital elevation model (DEM) using a geographic information system. The same explanatory variables were defined for a 5 ?? 5 km grid of the DEM. For each time step, the best MLR equation was chosen and used to interpolate precipitation onto the 5 ?? 5 km grid. The gridded values of precipitation provide a physically-based estimate of the spatial distribution of precipitation and result in reliable simulations of daily runoff in the Animas River basin.

  5. A multiple regression model for the Ft. Calhoun reactor coolant pump system

    Multiple regression analysis is one of the most widely used of all statistical tools. In this research paper, we introduce an application of fitting a multiple regression model on reactor coolant pump (RCP) data. The primary purpose of this research is to correlate the results obtained by Design of Experiments (DOE) and regression model fitting. Also, the idea behind using regression model is to gain more detailed information in the RCP data than provided by DOE. In engineering science, statistical quality control techniques have traditionally been applied to control manufacturing processes. An application to commercial nuclear power plant maintenance and control is presented that can greatly improve plant safety and reliability. The result obtained show that six out of ten parameters are under control specification limits and four parameters are not in the state of statistical control. The four parameters that are out of control adversely affect the regression model fitting and the final prediction equation, thereby, does not predict accurate response for the future. The analysis concludes that in order to fit a best regression model, one has to remove all out of control points from the data set, including dropping a variable from the model to have better prediction of the response variable. (author)

  6. Regression analysis of post-CHF flow boiling data

    The successful application of statistical analysis in systematic investigations of heat transfer data for boiling water beyond the critical heat flux is described. Multiple linear regression analysis together with statistical tests of correlations and data were used in this study. Data from a number of experiments encompassing film and transition boiling in several geometries were correlated by boiling regime, by geometry, and in aggregate. Error estimates and uncertainty bounds were specified for all such correlations. (U.S.)

  7. Functional data analysis of generalized regression quantiles

    Guo, Mengmeng

    2013-11-05

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  8. Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

    Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro

    2012-11-01

    Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. PMID

  9. Multiple regression models for energy use in air-conditioned office buildings in different climates

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  10. Landslide Susceptibility Mapping Using Multiple Regression and GIS Tools in Tajan Basin, North of Iran

    Somayeh Mashari; Karim Solaimani; Ebrahim Omidvar

    2012-01-01

    Landslide is a natural hazard that causes many damages to the environment. Depending on the landform, several factors can cause the Landslide. This research addresses the methodology for landslide susceptibility mapping using multiple regression analysis and GIS tools. Based on the initial hypothesis, ten factors were recognized as effectual elements on landslide, which is geology, slope, aspect, distance from roads, faults and drainage network, soil capability, land use and rainfall. Crossin...

  11. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-01-01

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other...

  12. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components

    Riccardi, M.; Mele, G.; Pulvento, C.;

    2014-01-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expe...... foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. © 2014 Springer Science+Business Media Dordrecht....... expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB...... components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by...

  13. Assessing the binding affinity of a selected class of DPP4 inhibitors using chemical descriptor-based multiple linear regression

    Jose Isagani Janairo; Gerardo Janairo; Frumencio Co; Derrick Ethelbhert Yu

    2011-01-01

    The activity of a selected class of DPP4 inhibitors was preliminarily assessed using chemical descriptors derived AM1 optimized geometries. Using multiple linear regression model, it was found that ?E0, LUMO energy, area, molecular weight and ?H0 are the significant descriptors that can adequately assess the binding affinity of the compounds. The derived multiple linear regression (MLR) model was validated using rigorous statistical analysis. The preliminary model suggests t...

  14. Genetic Algorithm Based Outlier Detection Using Bayesian Information Criterion in Multiple Regression Models Having Multicollinearity Problems

    ALMA, Özlem GÜRÜNLÜ; KURT, Serdar; UĞUR, Aybars

    2010-01-01

    Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important proble...

  15. Forecasting Gold Prices Using Multiple Linear Regression Method

    Z. Ismail

    2009-01-01

    Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as “forecast-1” was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on “a hunch of experts”, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to

  16. Forecasting Electrical Load using ANN Combined with Multiple Regression Method

    Saeed M. Badran; Ossama B. Abouelatta

    2012-01-01

    This paper combined artificial neural network and regression modeling methods to predict electrical load. We propose an approach for specific day, week and/or month load forecasting for electrical companies taking into account the historical load. Therefore, a modified technique, based on artificial neural network (ANN) combined with linear regression, is applied on the KSA electrical network dependent on its historical data to predict the electrical load demand forecasting up to year 2020. T...

  17. Multiple predictor smoothing methods for sensitivity analysis

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  18. Multiple predictor smoothing methods for sensitivity analysis.

    Helton, Jon Craig; Storlie, Curtis B.

    2006-08-01

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

  19. 神经症患者抑郁状况影响因素的分析%Multiple linear regression and path analysis on influencing factors of depressive neurosis

    江家靖; 郭文斌(通讯作者)

    2014-01-01

    Objective: To explore the influencing factors and mechanism of depressive neurosis by multiple linear regression and path analysis. Methods: 55 cases of depressive neurosis in open wards of our hospital were investigated. CES-D, ATQ, CTQ, SAS, SCSQ and SSRS were used to explore the influencing factors of depressive neurosis. Multiple linear regression and path analysis were used to probe the influence of depression with life events, coping style and social support. Role model of the main factors was analyzed in this study. Results: Stress response, automatic thoughts, life events, subjective support, objective support, and actively or negative coping were main factors of depressive neurosis. The influences of various factors on depression were different. Automatic thoughts and coping style could induce depression through a direct way. Social support had no direct relation to depression whose path coefficients was not statistical y significant. Life events could influence the occurrence of depression through indirect channels (mediated by coping style). Conclusions: The occurrence of depression is a combined effect of life events, coping style and social support. Multiple regression analysis and path analysis have their own effect in exploring the mechanism of depression. The results can be mutual y complementary to each other.%目的:对神经症患者抑郁影响因素进行多元线性回归和路径分析,探讨影响抑郁的因素与作用机制。方法应用流调中心用抑郁量表、自动思维问卷、儿童期经历问卷、ZUNG焦虑自评量表、简易应对方式量表、自尊量表、攻击行为量表和社会支持量表对55例在广西医科大学第一附属医院开放病房住院且经专科医生确诊为神经症患者抑郁的影响因素进行调查与测评。同时应用多元线性回归和路径分析方法调查应对方式、生活事件及社会支持等在抑郁中的影响程度与作用模式,分析各影响因素对

  20. Single and multiple index functional regression models with nonparametric link

    Chen, Dong; Hall, Peter; Müller, Hans-Georg

    2011-01-01

    Fully nonparametric methods for regression from functional data have poor accuracy from a statistical viewpoint, reflecting the fact that their convergence rates are slower than nonparametric rates for the estimation of high-dimensional functions. This difficulty has led to an emphasis on the so-called functional linear model, which is much more flexible than common linear models in finite dimension, but nevertheless imposes structural constraints on the relationship between predictors and re...

  1. The Linear and Non-displaced Estimator in Multiple Regression

    Constantin ANGHELACHE; Voineagu, Vergil; Alexandru MANOLE; Diana Valentina SOARE; Ligia PRODAN

    2013-01-01

    Under the hypotheses IA and IB, OLS estimators are both linear and stationary. For it to provide the same minimum variance of all linear and stationary estimators and to take part of BLUE, it is necessary that the classical assumptions IIB and IIC should be available. As in the case of two-variable regression, this means that the residual factors has to be homoschedastic and non-autocorrelated.

  2. Repeated Results Analysis for Middleware Regression Benchmarking

    Bulej, Lubomír; Kalibera, T.; Tůma, P.

    2005-01-01

    Roč. 60, - (2005), s. 345-358. ISSN 0166-5316 R&D Projects: GA ČR GA102/03/0672 Institutional research plan: CEZ:AV0Z10300504 Keywords : middleware benchmarking * regression benchmarking * regression testing Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.756, year: 2005

  3. Sliced Inverse Regression for Time Series Analysis

    Chen, Li-Sue

    1995-11-01

    In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.

  4. Determinants of Serum PCBs in Adolescents and Adults: Regression Tree Analysis and Linear Regression Analysis

    Govarts, Eva; Den Hond, Elly; Schoeters, Greet; Bruckers, Liesbeth

    2010-01-01

    Regression tree analysis, a non-parametric method, was undertaken to identify predictors of the serum concentration of polychlorinated biphenyls (sum of marker PCB1 138, 153, and 180) in humans. This method was applied on biomonitoring data of the Flemish Environment and Health study (2002-2006) and included 1679 adolescents and 1583 adults. Potential predictor variables were collected via a self-administered questionnaire, assessing information on lifestyle, food intake, use of tobacco and a...

  5. FIELD WATER CAPASITY MODELLING ACCORDING TO SOIL TEXTURE USING PRINCIPLE COMPONENT REGRESSION ANALYSIS

    Kürşad ÖZKAN

    2009-01-01

    The purpose of the paper is to determine a model, the soil field water capacity in accordance with soil texture. At first, multiple regression analysis has been used to determine a model. But, it was found multiple relation problem in the model because of strong relationships among the independence variables. Therefore, principle component regression analysis was applied and the problem was solved. It is known that sand, dust and clay contents play important roles on field water capacity. But...

  6. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

    2016-05-10

    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374

  7. Regression Analysis with a Stochastic Design Variable

    Sazak,, Hakan S.; Moti L Tiku; Qamarul Islam, M.

    2006-01-01

    In regression models, the design variable has primarily been treated as a nonstochastic variable. In numerous situations, however, the design variable is stochastic. The estimation and hypothesis testing problems in such situations are considered. Real life examples are given.

  8. Spatial regression analysis on 32 years total column ozone data

    J. S. Knibbe

    2014-02-01

    Full Text Available Multiple-regressions analysis have been performed on 32 years of total ozone column data that was spatially gridded with a 1° × 1.5° resolution. The total ozone data consists of the MSR (Multi Sensor Reanalysis; 1979–2008 and two years of assimilated SCIAMACHY ozone data (2009–2010. The two-dimensionality in this data-set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on non-seasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO, El Nino (ENSO and stratospheric alternative halogens (EESC. For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at high and mid-latitudes, the solar cycle affects ozone positively mostly at the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high Northern latitudes, the effect of QBO is positive and negative at the tropics and mid to high-latitudes respectively and ENSO affects ozone negatively between 30° N and 30° S, particularly at the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally

  9. Tightness of M-estimators for multiple linear regression in time series

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  10. Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models

    Lenk, Jan; M, Claus

    2011-01-01

    In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi

  11. Multinomial Inverse Regression for Text Analysis

    Taddy, Matt

    2010-01-01

    Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represen...

  12. Egg hatchability prediction by multiple linear regression and artificial neural networks

    AC Bolzan; RAF Machado; JCZ Piaia

    2008-01-01

    An artificial neural network (ANN) was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determi...

  13. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  14. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    Ohlmacher, G.C.; Davis, J.C.

    2003-01-01

    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  15. Managing Software Project Risks (Analysis Phase) with Proposed Fuzzy Regression Analysis Modelling Techniques with Fuzzy Concepts

    Elzamly, Abdelrafe; Hussin, Burairah

    2014-01-01

    The aim of this paper is to propose new mining techniques by which we can study the impact of different risk management techniques and different software risk factors on software analysis development projects. The new mining technique uses the fuzzy multiple regression analysis techniques with fuzzy concepts to manage the software risks in a software project and mitigating risk with software process improvement. Top ten software risk factors in analysis phase and thirty risk management techni...

  16. Structure Coefficients versus Scoring Coefficients as Bases for Interpreting Emergent Variables in Multiple Regression and Related Techniques.

    Harris, Richard J.

    Interpretation of emergent variables on the basis of structure coefficients (zero order correlations between original and emergent variables) is potentially very misleading and should be avoided in favor of interpretation on the basis of scoring coefficients. This is most apparent in multiple regression analysis and its special case, two-group…

  17. Prediction of groundwater table and salinity fluctuations with a time series multiple regression technique

    Seeboonruang, U.

    2013-12-01

    Time series techniques have been extensively applied to research works of many academic disciplines, particularly those concerned with economics and environment. This paper presents application of a time series multiple linear regression technique to a groundwater system to predict groundwater level and salinity fluctuations in a saline area in the northeastern part of Thailand. Surface and groundwater interaction is the major mechanism controlling the shallow subsurface system and salinity of the area. The basic technique is based on the lagged correlation between hydrologic, and hydrogeological and environmental parameters. As a result of a large irrigation project in the area, several regulating gates have been installed to control flooding to the downstream rivers and to provide the upstream areas with sufficient irrigating water. From the lagged correlation analysis, the shallow groundwater and groundwater salinity fluctuation in the irrigating area are shown to be dependent upon the surface water levels at the installed regulated gates and prior rainfall. A set of multiple linear regression equations with lagged time dependent function are then formulated. The dependent variables are groundwater level and groundwater salinity while the independent variables are rainfall rates and water levels measured at the regulating gates. After calibration and verification, the model, as an alternative to the conventional method which requires detailed and continuous variables and is costlier, can be used to forecast and manage future groundwater systems.

  18. Survival Analysis with Multivariate adaptive Regression Splines

    Kriner, Monika

    2007-01-01

    Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear effects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate effects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...

  19. Multiple Logistic Regression Analysis of Social Networking Services on the College Students'Emotions,Depression and Self-esteem%大学生使用社交网络对情绪、抑郁、自尊的影响

    王晨羽; 徐骞; 陈紫薇; 林育芳

    2015-01-01

    Objective:To explore the impact of social networking services and various factors on emo-tions,depression, and self -esteem.Methods:The Chinese Affect Scale , the centre for Epidemiologic studies depression scale and the self -esteem scale were used to collect the data which was statistical de-scribed and multiple logistic regression analyzed by SPSS For Windows 19.0.Results:①The 512 college students were 20.50 ±1.49 years old on average.The average years of using SNS were at (7.16 ±2.67) years,the times to login in SNS per day was 14.31 ±15.96 times on average,the time spent on SNS per day was (2.81 ±2.04) hours on average,and the longest time of one single use was 2.98 ±2.76 hours on average .②Logistic regression analysis on positive emotions showed that OR of "engineering students"was 0.53(P=0.079)compared to "arts students".③Logistic regression analysis on negative emotions showed that OR of "age","years of using SNS"and"time spent on SNS per day"was 1.14 ( P =0.063),0.90(P=0.008)and 1.09(P=0.080).OR of the students who couldn't stand if stop using SNS for a month was 2.41(P=0.003)compared to the students who would feel more relaxed .④Logistic regression analysis on CES -D showed that OR of "the years of using SNS"was 0.89(P=0.007).And ORs of the junior students and senior students were 1.69(P=0.086)and 2.74(P=0.002)compared to the freshmen .ORs of the students who couldn't stand if stop using SNS for a month and the students who didn't care were 2.62(P=0.002)and 1.87(P=0.023)compared to the students who would feel more relaxed.⑤Logistic regression analysis on SES showed that OR of "the times to login in SNS per day"was 1.01(P=0.056).ORs of engineering students and science students were 0.56(P=0.046)and 0.49(P=0.028)compared to art students.OR of the students coming from city was 1.27(P=0.032)compared to the students coming from towns and villages .Conclusion:①Majors have an effect on the positive emo-tions.Age,the years of using SNS ,the time

  20. Simulation Experiments in Practice: Statistical Design and Regression Analysis

    Kleijnen, J.P.C.

    2007-01-01

    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic DOE and regression analysis assume a single simulation response that is normally and independen...

  1. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

    Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.

    2016-02-01

    The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

  2. Multiattribute shopping models and ridge regression analysis

    Timmermans, HJP Harry

    1981-01-01

    Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...

  3. THE USE OF REGRESSION ANALYSIS IN MARKETING RESEARCH

    DUMIRESCU Luigi; Stanciu, Oana; Mihai TICHINDELEAN; Simona VINEREAN

    2012-01-01

    The purpose of the paper is to illustrate the applicability of the linear multiple regression model within a marketing research based on primary, quantitative data. The theoretical background of the developed regression model is the value-chain concept of relationship marketing. In this sense, the authors presume that the outcome variable of the model, the monetary value of one purchase, depends on the clients’ expectations regarding seven dimensions of the company’s offer. The paper is struc...

  4. Analysis of genome-wide association data by large-scale Bayesian logistic regression

    Wang Yuanjia; Sha Nanshi; Fang Yixin

    2009-01-01

    Abstract Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data fro...

  5. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  6. The application of stata multiple inputation command to analyze design of experiments with multiple regression

    Clara Novoa; Suleima Alkusari

    2012-01-01

    This talk exemplifies the application of the multiple imputation technique available in STATA to analize a design of experiments with multiple responses and missing data. No imputation and multiple imputation methodologies are compared.

  7. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Željko V. Račić

    2010-12-01

    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  8. Applying Least Absolute Shrinkage Selection Operator and Akaike Information Criterion Analysis to Find the Best Multiple Linear Regression Models between Climate Indices and Components of Cow’s Milk

    Mohammad Reza Marami Milani

    2016-07-01

    Full Text Available This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new, and respiratory rate predictor RRP with three main components of cow’s milk (yield, fat, and protein for cows in Iran. The least absolute shrinkage selection operator (LASSO and the Akaike information criterion (AIC techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49 respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001 with R2 (0.69. For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

  9. 老年多器官功能不全综合征发病危险因素的逐步Logistic回归分析%Stepwise Logistic Regression Analysis of Risk Factors of Multiple Organ Dysfunction Syndrome in Elderly

    谭清武; 李庆华

    2009-01-01

    Objective To study the risk factors of multiple organ dysfunction syndrome in elderly (MODSE).Methods A retrospective study was conducted on data of 393 patients aging over 60 hospitalized due to lung infection or having lung infection in hospital from 2001 to 2006.The patients were divided into group MODSE(n=196) and group non-MODSE(n=224).Risk factors of statistical significance were first screened out by single factor analysis,and then independent risk factors by stepwise Logistic regression analysis.Results Single factor analysis showed that age,chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary interstitial fibrosis,pulmonary heart disease,coronary heart disease,chronic cardiac insufficiency,cerebrovascular disease,cervical spondylosis,chronic hepatitis and cirrhosis,diabetes,hyperuricemia,chronic renal failure,malignant tumor,hemoglobin,albumin,urea nitrogen,creatinine and fasting blood glucose were risk factors of MODSE.Stepwise Logistic regression analysis showed that chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.Conclusion Chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.%目的 探讨老年多器官功能不全综合征(MODSE)的发病危险因素.方法 回顾性调查2001-2006年因肺部感染在我院住院或住院期间出现肺部感染的驻石家庄地区60岁以上的师以上军队离退休干部393例的病历资料,根据肺部感染是否诱发MODSE将393例患者分为MODSE组(169例)和非MODSE组(224例).先以单因素分析筛选有统计学

  10. Tightness of M-estimators for multiple linear regression in time series

    Johansen, Søren; Nielsen, Bent

    2016-01-01

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires an assumption on the frequency of small regressors. We show that this is satisfied for a variety of deterministic and stochastic regressors, including stationary an random walks regressors. The resul...

  11. Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth

    Hallin, Marc; Paindaveine, Davy; Siman, Miroslav

    2008-01-01

    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett’s traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the cl...

  12. On asymptotics of t-type regression estimation in multiple linear model

    CUI Hengjian

    2004-01-01

    We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.

  13. Neutron multiplicity analysis tool

    Stewart, Scott L [Los Alamos National Laboratory

    2010-01-01

    I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from ({alpha},n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ({le} 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This

  14. An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms

    Sykas, Dimitris; Karathanassi, Vassilia

    2015-06-01

    This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.

  15. Research on the impact factors of domestic old people' s tourism consumption through multiple stepwise regression analysis%老年游客旅游决策影响因素之多元逐步回归分析

    章杰宽

    2011-01-01

    作者历时2个多月,在大量走访以及问卷调查的基础之上,着重研究分析了影响国内老年游客旅游消费行为的众多因素,并运用多元逐步回归分析方法研究了各因素对老年人旅游消费行为的影响程度。结论显示,影响老年人旅游行为的主要有13个因素,其中老年人的收入水平、旅游地景点的吸引力是影响老年人旅游行为——旅游次数、旅游停留时间和旅游日消费额的共同因素,而收入水平最为关键。%As our country population aging advancement is more and more obvious, the old tourist industry is rapidly becoming an important part of the tour market. Experience and theory of tourism behavior have shown that travel frequency, residence time and amount of tourism consumption are the main indicators to measure the attractiveness of a tourism destination. This paper makes an empirical study through questionnaires among the old tourists located in 12 main tourist attractions in Xi' an. Based on 800 questionnaires, this paper emphatically analyses the influencing factors of the domestic old tourists' consumption behavior and employs the multiple stepwise regression analysis to have studied the affecting degree of every factor. Results conclude that 13 main factors affect the travel behavior of older people; they are physical condition, income, attitude of tourism, spouse, attitude of sons and daughters, related groups, tourism prices, distance, security, climatic conditions, food and accommodation, transport and tourism attraction. Among these factors, income and tourism attraction are the common factors affecting old tourists' travel frequency, residence time, amount of consumption per day. Specifically, the old tourists' travel frequency is directly proportional to income, attitude of tourism, attitude of sons and daughters, physical condition, tourism attraction and is inversely proportional to distance. The old tourists' residence

  16. Multiple Linear Regression Analysis of Quality of Life in Children with Cerebral Palsy%脑性瘫痪患儿生存质量相关因素多重线性回归分析

    万瑞平; 刘振寰; 林青梅

    2011-01-01

    Objective To analyze the correlative factors influencing quality of life(QOL) in children with cerebral palsy(CP). Methods Eighty children with CP( CP group) and 80 healthy children( healthy control group) were eveluated by Pediatric Quality of Life Inventory Version 4 (PedsQL4.0) to assess their QOL,and then the differences in QOL of children were compared between the 2 groups. Children with CP were also assessed using Gesell Developmental Scale(GDS) and Gross Motor Function Classification System(GMFCS) to test their developmental quotient and severity, and then the correlation among QOL,sex, family incomes, clinical types, GM FCS,and the intelligence capacity were analyzed by multiple regression analysis. Results There were significant differences in physical function/aspect, emotional function, social function, psychological aspect and total QOL between CP group and healthy conorol group (Pa < 0.01 ). Intelligence degree was positive correlated to total score of QOL. Severity degree and intelligence degree were positive correlated to physical aspect, and age was negative correlated to physical aspect, while severity degree affected physical aspect most. Intelligence degree was positive correlated to psychological aspects. Conclusions QOL of children with CP had impairment in full - scale. The intelligence capacity and the physical functions and intelligence degree are important factors which influence QOL of children with CP.%目的 分析影响脑性瘫痪(脑瘫)儿童生存质量的相关因素.方法 将确诊为脑瘫的80例患儿作为脑瘫组,同时选择80例同龄健康儿童作为健康对照组.采用儿童生存质量的PedsQL4.0普适性核心量表对2组儿童的生存质量进行评定,比较2组儿童生存质量的差异;采用粗大运动功能分级系统(GMFCS)评定脑瘫患儿粗大运动功能的级别,采用北京Gesell发育商评定脑瘫患儿的智力水平;采用多重线性回归分析脑瘫患儿生存质量与性别、月

  17. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  18. Tumor regression of multiple bone metastases from breast cancer after administration of strontium-89 chloride (Metastron)

    We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases

  19. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  20. Regression calibration for classical exposure measurement error in environmental epidemiology studies using multiple local surrogate exposures.

    Bateson, Thomas F; Wright, J Michael

    2010-08-01

    Environmental epidemiologic studies are often hierarchical in nature if they estimate individuals' personal exposures using ambient metrics. Local samples are indirect surrogate measures of true local pollutant concentrations which estimate true personal exposures. These ambient metrics include classical-type nondifferential measurement error. The authors simulated subjects' true exposures and their corresponding surrogate exposures as the mean of local samples and assessed the amount of bias attributable to classical and Berkson measurement error on odds ratios, assuming that the logit of risk depends on true individual-level exposure. The authors calibrated surrogate exposures using scalar transformation functions based on observed within- and between-locality variances and compared regression-calibrated results with naive results using surrogate exposures. The authors further assessed the performance of regression calibration in the presence of Berkson-type error. Following calibration, bias due to classical-type measurement error, resulting in as much as 50% attenuation in naive regression estimates, was eliminated. Berkson-type error appeared to attenuate logistic regression results less than 1%. This regression calibration method reduces effects of classical measurement error that are typical of epidemiologic studies using multiple local surrogate exposures as indirect surrogate exposures for unobserved individual exposures. Berkson-type error did not alter the performance of regression calibration. This regression calibration method does not require a supplemental validation study to compute an attenuation factor. PMID:20573838

  1. Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique

    Ahn, Kuk-Hyun; Palmer, Richard

    2016-09-01

    Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.

  2. Innovation and market value: a quantile regression analysis

    Alex Coad; Rekha Rao

    2006-01-01

    We construct a new database by matching firm-level Compustat data to NBER patent data, for four 2-digit complex technology sectors. Whilst conventional regression estimators show that the stock market does recognise efforts at innovation, quantile regression analysis adds a new dimension to the literature, suggesting that the influence of innovation on market value varies dramatically across the market value distribution. For firms with a low value of Tobin's q, the stock market will barely r...

  3. Background stratified Poisson regression analysis of cohort data

    Richardson, David B.; Langholz, Bryan

    2011-01-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approa...

  4. Linear regression and sensitivity analysis in nuclear reactor design

    Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data

  5. Sintering equation: determination of its coefficients by experiments - using multiple regression

    Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)

  6. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;

    2014-01-01

    power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...... conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in...

  7. 亚健康量表中过敏方面影响因素的多元逐步回归分析%Multiple Stepwise Regression Analysis on Affecting Factors of Allergy in Sub-Health Scale

    崔利宏; 何裕民; 倪红梅

    2013-01-01

    Objective:To study the affecting factors of allergy in sub-health people,and prevent the occurrence of allergy. Methods:Possible affecting factors of allergy in 6 975 cases of sub-health people were filtered by using multiple stepwise regression method. There were thirteen factors leading to the multiple stepwise regression model, namely: degree, age, fatigue, digestion, sleep, plant nerve, immunity, aging,constipation,depression,learning,memory,self-realization and sex. Results:Statistical results showed that allergy had correlation with 19 aspects of four areas of body performance,psychological,social adaptation,sex,and age and degree. Among these,a positive correlation was presented between allergy and 19 aspects of four areas of body performance,psychological ,social adaptation,sex,and degree and a negative correlation was existed between allergy and age,and both with statistical significance(P <0.01). Conclusion:The prevention of allergy should focus on the whole adjustment and strengthen people's physique. In clinical,affecting factors of allergy should be fully considered in order to avoid missed diagnosis, erroneous diagnosis and delay of the illness and reduce the quality of life.%目的:探讨亚健康人群中过敏的影响因素,预防过敏的发生.方法:对6 975例亚健康人群,采用多元逐步回归方法对过敏的可能影响因素进行筛选.进入多元逐步回归模型的因素有13个,分别是:学历、年龄、疲劳、消化、睡眠、植物神经、免疫力、衰老、便秘、抑郁、学习、记忆力、自我实现及性生活.结果:统计结果显示,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及年龄、学历均存在相关性.其中,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及学历均呈正相关;与年龄呈负相关.统计学均具有极显著性意义(P<0.01).结论:预防过敏应注重整体调整、增

  8. Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil

    Newton Carneiro Affonso da Costa Jr.

    2004-06-01

    Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.

  9. Time series analysis using semiparametric regression on oil palm production

    Yundari, Pasaribu, U. S.; Mukhaiyar, U.

    2016-04-01

    This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).

  10. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  11. QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions

    Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali

    2015-07-01

    The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.

  12. Multiple regression as a preventive tool for determining the risk of Legionella spp.

    Enrique Gea-Izquierdo

    2012-04-01

    Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.

  13. MANUFACTURING AND CONTINUOUS IMPROVEMENT AREAS USING PARTIAL LEAST SQUARE PATH MODELING WITH MULTIPLE REGRESSION COMPARISON

    Carlos Monge Perry

    2014-07-01

    Full Text Available Structural equation modeling (SEM has traditionally been deployed in areas of marketing, consumer satisfaction and preferences, human behavior, and recently in strategic planning. These areas are considered their niches; however, there is a remarkable tendency in empirical research studies that indicate a more diversified use of the technique.  This paper shows the application of structural equation modeling using partial least square (PLS-SEM, in areas of manufacturing, quality, continuous improvement, operational efficiency, and environmental responsibility in Mexico’s medium and large manufacturing plants, while using a small sample (n = 40.  The results obtained from the PLS-SEM model application mentioned, are highly positive, relevant, and statistically significant. Also shown in this paper, for purposes of validity, reliability, and statistical power confirmation of PLS-SEM, is a comparative analysis against multiple regression showing very similar results to those obtained by PLS-SEM.  This fact validates the use of PLS-SEM in areas of untraditional scientific research, and suggests and invites the use of the technique in diversified fields of the scientific research

  14. Multiple regression method to determine aerosol optical depth in atmospheric column in Penang, Malaysia

    Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global

  15. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 °C, without detailed knowledge or need for simulation of the process. - Highlights: • The maximum thermal efficiency of ORCs in hundreds of cases was analysed. • Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. • Using only key design parameters, the maximum obtainable efficiency can be evaluated. • The regression models decrease the resources needed to evaluate the maximum potential. • The models are statistically strong and in good agreement with the literature

  16. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  17. 基于多元逐步回归的脑卒中发病影响因子分析%Analysis of stroke incidence impact factors based on stepwise multiple regression

    王建芳

    2013-01-01

    In this paper,stroke incidence impact factors were analyzed.First,the huge cases information through statistics and analysis,then it presented a mathematical model through regression fitting method,and established the relationship between stroke incidence and air temperature,barometric pressure and humidity.Last,it made some suggestions on the high-risk groups.As a result,the 2012 Higher Education Press Cup National Mathematical Contest in Modeling C title problem given a complete answer.%对脑卒中发病影响因子进行了分析和研究.首先对庞大的病例信息进行了统计分析,然后通过回归拟合的方法建立了数学模型,确立了脑卒中发病率与气温、气压和湿度间的关系,最后就高危人群提出了一些建议.由此,对2012“高教社杯”全国大学生数学建模竞赛C题的各问题给出了完整的解答.

  18. Sparse Regression by Projection and Sparse Discriminant Analysis

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  19. The use of weighted multiple linear regression to estimate QTL-by-QTL epistatic effects

    Jan Bocianowski

    2012-01-01

    Knowledge of the nature and magnitude of gene effects, as well as their contribution to the control of metric traits, is important in formulating efficient breeding programs for the improvement of plant genetics. Information concerning a genetic parameter such as the additive-by-additive epistatic effect can be useful in traditional breeding. This report describes the results obtained by applying weighted multiple linear regression to estimate the parameter connected with an additive-by-addit...

  20. Multiple Linear Regression Application on the Inter-Network Settlement of Internet

    YANG Qing-feng; ZHANG Qi-xiang; L(U) Ting-jie

    2006-01-01

    This paper develops an analytical framework to explain the Internet interconnection settlement issues. The paper shows that multiple linear regression can be used in assessing the network value of Internet Backbone Providers (IBPs).By using the exchange rate of each network, we can define a rate of network value, which reflects the contribution of each network to interconnection and the interconnected network resource usage by each of the network.

  1. Regression analysis for solving diagnosis problem of children's health

    Cherkashina, Yu A.; Gerget, O. M.

    2016-04-01

    The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.

  2. Driven Factors Analysis of China’s Irrigation Water Use Efficiency by Stepwise Regression and Principal Component Analysis

    Renfu Jia; Shibiao Fang; Wenrong Tu; Zhilin Sun

    2016-01-01

    This paper introduces an integrated approach to find out the major factors influencing efficiency of irrigation water use in China. It combines multiple stepwise regression (MSR) and principal component analysis (PCA) to obtain more realistic results. In real world case studies, classical linear regression model often involves too many explanatory variables and the linear correlation issue among variables cannot be eliminated. Linearly correlated variables will cause the invalidity of the fac...

  3. Regression Analysis: Instructional Resource for Cost/Managerial Accounting

    Stout, David E.

    2015-01-01

    This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…

  4. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits. PMID:21298571

  5. Multiple predictor smoothing methods for sensitivity analysis: Description of techniques

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  6. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-10-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037

  7. Specification and sensitivity analysis of cross-country growth regressions

    Thanasis Stengos; Theofanis P. Mamuneas; Pantelis Kalaitzidakis

    2002-01-01

    We compare the sensitivity analysis of cross-country growth regressions based on extreme bounds analysis to a more direct specification testing approach using non-nested hypotheses tests. The results suggest that those specifications that are adequate are also those that include two of the only few conditioning variables that are found to be robust, namely the standard deviation of inflation and the standard deviation of domestic credit.

  8. Predicting manual arm strength: A direct comparison between artificial neural network and multiple regression approaches.

    La Delfa, Nicholas J; Potvin, Jim R

    2016-02-29

    In ergonomics, strength prediction has typically been accomplished using linked-segment biomechanical models, and independent estimates of strength about each axis of the wrist, elbow and shoulder joints. It has recently been shown that multiple regression approaches, using the simple task-relevant inputs of hand location and force direction, may be a better method for predicting manual arm strength (MAS) capabilities. Artificial neural networks (ANNs) also serve as a powerful data fitting approach, but their application to occupational biomechanics and ergonomics is limited. Therefore, the purpose of this study was to perform a direct comparison between ANN and regression models, by evaluating their ability to predict MAS with identical sets of development and validation MAS data. Multi-directional MAS data were obtained from 95 healthy female participants at 36 hand locations within the reach envelope. ANN and regression models were developed using a random, but identical, sample of 85% of the MAS data (n=456). The remaining 15% of the data (n=80) were used to validate the two approaches. When compared to the development data, the ANN predictions had a much higher explained variance (90.2% vs. 66.5%) and much lower RMSD (9.3N vs. 17.2N), vs. the regression model. The ANN also performed better with the independent validation data (r(2)=78.6%, RMSD=15.1) compared to the regression approach (r(2)=65.3%, RMSD=18.6N). These results suggest that ANNs provide a more accurate and robust alternative to regression approaches, and should be considered more often in biomechanics and ergonomics evaluations. PMID:26876987

  9. Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression

    Nishii, Ryuei

    1984-01-01

    In normal linear regression analysis, many model selection rules proposed from various viewpoints are available. For the information criteria AIC, FPE, $C_p$, PSS and BIC, the asymptotic distribution of the selected model and the asymptotic quadratic risk based on each criterion are explicitly obtained.

  10. Early cost estimating for road construction projects using multiple regression techniques

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  11. About the multiple linear regressions applied in studying the solvatochromic effects.

    Dorohoi, Dana-Ortansa

    2010-03-01

    Statistical analysis is applied to study the solvatochromic effects using the solvent parameters (regressors) influencing the spectral shifts in the electronic spectra. The data pointed to eliminate the non-significant parameters and the aberrant points (for which supplemental interactions were neglected in used theories) from those supposed to multi-linear regression. A BASIC program permits to follow these desiderates step by step. In order to exemplify the steps of regression, the wavenumbers of the maximum pi-pi* absorption band of three benzene derivatives in various solvents were used. PMID:20089443

  12. Multiple regression analysis of urinary fluoride, s aliva and plaque fluoride levels of adolescents dental fluorosis%氟斑牙青少年尿氟与唾液氟及菌斑氟的相关性分析

    于阳阳; 赵伟; 刘晓燕; 邹冬荣; 杨晓昀; 刘荣; 于晓峰; 营杰

    2016-01-01

    Objective The purpose of this study was to study the correlation between dental fluorosis, saliva and plaque fluoride levels and urinary fluoride values in adolescents dental fluorosis. Methods A middle school was chosen as a survey point in the study. Two hundred adolescents were examined the degree of dental fluorosis by Dean's method. These adolescents were divided into four groups according to the severity of fluorosis (n = 52, 40, 28 and 80). Fluoride ion specific electrode was used to measure the fluoride levels in dental plaque, saliva, urinary and drinking water. The differences were analyzed b y ANOVA. Correlation of the fluoride levels between dental plaque, saliva, urine and the degree of dental fluorosis were analyzed by the method of multiple linear regression. Results The average fluoride content of drinking water was (2.20 ± 0.40) mg/L. Compared with controls, the fluoride concentrations in dental plaque, saliva and urine were higher in light, medium and severe dental fluorosis groups [(1.55 ± 0.88), (1.94 ± 0.77), (2.74 ± 0.83) than (0.32 ± 0.20) mg/L; (4.44 ± 1.62), (8.09 ± 0.93), (10.72 ± 0.99) than (0.02 ± 0.01) mg/L;(31.77 ± 6.09), (57.98 ± 1.83), (65.98 ± 2.78) than (13.06 ± 2.11) μg/g, all P<0.05]. Urinary fluoride was correlated with fluoride in saliva and dental plaque (r=0.245, 0.440, all P<0.05). Saliva fluoride was correlated with fluoride in dental plaque (r=0.849, P<0.01). The degree of dental fluorosis was correlated with fluoride in urine and saliva (r = 0.497, 0.896, 0.924, all P< 0.01). The multiple linear regression equation between fluoride in urine and the degree of dent al fluorosis, fluoride in dental plaque and saliva was as follow: y = 1.357 + 1.618x1 + 0.001x2 - 0.331x3 ± 0.69. Conclusions The metabolism of fluoride in body is related with oral fluoride repository in adolescents dental fluorosis. Fluoride in urine is influenced by plaque fluoride level, saliva fluoride concentration and the degree of dental

  13. Multiple trait model combining random regressions for daily feed intake with single measured performance traits of growing pigs

    Künzi Niklaus

    2002-01-01

    Full Text Available Abstract A random regression model for daily feed intake and a conventional multiple trait animal model for the four traits average daily gain on test (ADG, feed conversion ratio (FCR, carcass lean content and meat quality index were combined to analyse data from 1 449 castrated male Large White pigs performance tested in two French central testing stations in 1997. Group housed pigs fed ad libitum with electronic feed dispensers were tested from 35 to 100 kg live body weight. A quadratic polynomial in days on test was used as a regression function for weekly means of daily feed intake and to escribe its residual variance. The same fixed (batch and random (additive genetic, pen and individual permanent environmental effects were used for regression coefficients of feed intake and single measured traits. Variance components were estimated by means of a Bayesian analysis using Gibbs sampling. Four Gibbs chains were run for 550 000 rounds each, from which 50 000 rounds were discarded from the burn-in period. Estimates of posterior means of covariance matrices were calculated from the remaining two million samples. Low heritabilities of linear and quadratic regression coefficients and their unfavourable genetic correlations with other performance traits reveal that altering the shape of the feed intake curve by direct or indirect selection is difficult.

  14. Prediction of the NO2 concentration data in an urban area using multiple regression and neuronal networks

    Dragomir, Carmelia Mariana; Voiculescu, Mirela; Constantin, Daniel-Eduard; Georgescu, Lucian Puiu

    2015-12-01

    The probability of exceeding EU limit values for NO2 concentrations has increased in many European cities. Meteorological parameters have an extremely important role in evaluating the dispersion of pollutants in various city areas. This paper focuses on meteorological variations and their impact on urban background NO2 concentrations in the city of Braila for 2009-2013. The dependence between measured NO2 data and meteorological parameters are analyzed using two modeling methods: multiple linear regression and artificial neuronal networks. The dataset calculated using the proposed models indicate that artificial neural networks can be applied in the analysis and forecasting of air quality.

  15. Optimization of rheological parameter for micro-bubble drilling fluids by multiple regression experimental design

    郑力会; 王金凤; 李潇鹏; 张燕; 李都

    2008-01-01

    In order to optimize plastic viscosity of 18 mPa·s circulating micro-bubble drilling fluid formula,orthogonal and uniform experimental design methods were applied,and the plastic viscosities of 36 and 24 groups of agent were tested,respectively.It is found that these two experimental design methods show drawbacks,that is,the amount of agent is difficult to determine,and the results are not fully optimized.Therefore,multiple regression experimental method was used to design experimental formula.By randomly selecting arbitrary agent with the amount within the recommended range,17 groups of drilling fluid formula were designed,and the plastic viscosity of each experiment formula was measured.Set plastic viscosity as the objective function,through multiple regressions,then quadratic regression model is obtained,whose correlation coefficient meets the requirement.Set target values of plastic viscosity to be 18,20 and 22 mPa·s,respectively,with the trial method,5 drilling fluid formulas are obtained with accuracy of 0.000 3,0.000 1 and 0.000 3.Arbitrarily select target value of each of the two groups under the formula for experimental verification of drilling fluid,then the measurement errors between theoretical and tested plastic viscosity are less than 5%,confirming that regression model can be applied to optimizing the circulating of plastic-foam drilling fluid viscosity.In accordance with the precision of different formulations of drilling fluid for other constraints,the methods result in the optimization of the circulating micro-bubble drilling fluid parameters.

  16. Variable selection in multiple linear regression: The influence of individual cases

    SJ Steel

    2007-12-01

    Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.

  17. Estimation of mass flow of seeds using fibre sensor and multiple linear regression modelling

    Al-Mallahi, A. A.; Kataoka, T

    2013-01-01

    A new methodology to estimate the mass of grain seeds, which flow in the shape of clumps, was suggested in this paper. The methodology used an off-the-shelf digital fibre sensor to detect the behaviour of the clumps and multiple linear regression modelling to estimate the mass by the parameters detected by the sensor which were the length and the density of the clumps. An indoor apparatus was used for modelling which resembled the sowing process using the grain drill. A fluted roller was inst...

  18. A study of partial F tests for multiple linear regression models

    Liu, Wei; Jamshidian, Mortaza

    2007-01-01

    Partial F tests play a central role in model selections in multiple linear regression models. This paper studies the partial F tests from the view point of simultaneous confidence bands. It first shows that there is a simultaneous confidence band associated naturally with a partial F test. This confidence band provides more information than the partial F test and the partial F test can be regarded as a side product of the confidence band. This view point of confidence bands also leads to insi...

  19. Poisson Regression Analysis of Illness and Injury Surveillance Data

    Frome E.L., Watkins J.P., Ellis E.D.

    2012-12-12

    The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra

  20. Prediction of Postpubertal Reproductive Potential According to Prepubertal Body Weight, Testicular Size, and Testosterone Concentration Using Multiple Regression Analysis in Kıvırcık Ram Lambs

    ELMAZ, Özkan; DİKMEN, Serdal; CİRİT, Ümit; DEMİR, Hıdır

    2008-01-01

    The relationship between the prepubertal body weight, testicular size, testosterone concentration, and postpubertal reproductive function was investigated in Kıvırcık ram lambs. The body weight, testicular size, and testosterone concentration were measured every 20 days between 60 and 420 days of age. Semen was collected from the ram lambs at 7, 8, 9, 10, 11, 12, 13 and 14 months of age. Data obtained were analyzed by best subsets regression model. We determined that body weight, scrotal circ...

  1. COX MULTIVARIATE REGRESSION ANALYSIS OF RECURRENCE FACTORS FOR COLONIC CARCINOMA

    杜寒松; 王国斌; 秦青平; 夏玉春; 司徒光伟

    2004-01-01

    Objective: To determine the independent prognostic factors in the recurrence of colonic carcinoma after curative resection. Methods: Two hundred and one patients undergoing curative resections for colonic carcinoma were investigated by univariate and Cox multivariate regression analyses. Ten factors contributed to the rate were analyzed. Results: Dukes stages, obstruction, postoperative chemotherapy as well as the growth manner of the tumor were significantly associated with the recurrence rate of colonic carcinoma (P<0.05) by univariate analysis, while Dukes stages, obstruction, and postoperative chemotherapy were significant factors by the multivariate analysis. Conclusion: Dukes stages, obstruction, and postoperative chemotherapy are independent prognostic factors in the recurrence of colonic carcinoma.

  2. Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression

    Highlights: ► We obtained models for estimation of cetane number of biodiesel. ► Twenty-four neural networks using two topologies were evaluated. ► The best neural network for predict the cetane number was selected. ► The best accuracy was obtained for the selected neural network. - Abstract: Models for estimation of cetane number of biodiesel from their fatty acid methyl ester composition using multiple linear regression and artificial neural networks were obtained in this work. For the obtaining of models to predict the cetane number, an experimental data from literature reports that covers 48 and 15 biodiesels in the modeling-training step and validation step respectively were taken. Twenty-four neural networks using two topologies and different algorithms for the second training step were evaluated. The model obtained using multiple regression was compared with two other models from literature and it was able to predict cetane number with 89% of accuracy, observing one outlier. A model to predict cetane number using artificial neural network was obtained with better accuracy than 92% except one outlier. The best neural network to predict the cetane number was a backpropagation network (11:5:1) using the Levenberg–Marquardt algorithm for the second step of the networks training and showing R = 0.9544 for the validation data.

  3. MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES

    Parameshwar V Pandit; Javali, Shivalingappa B.

    2012-01-01

    Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established ...

  4. Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

    Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

    2015-12-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE

  5. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  6. Biplots in Reduced-Rank Regression

    Braak, ter, C.J.F.; Looman, C.W.N.

    1994-01-01

    Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal components analysis and can therefore be carried out with standard statistical packages. The proposed biplot highlights the major aspects of the regressions by displaying the least-squares approxima...

  7. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  8. Arch Height: A Regression Analysis of Different Measuring Parameters

    Hironmoy Roy

    2011-07-01

    Full Text Available Rationale: For measuring the height of the arch of foot either standing navicular height or talar height of the medial longitudinal arch was accepted in earlier days, where as the ‘standing normalised navicular height’ is taken by modern day by authors as a yardstick. But being troublesome and time consuming, we practically not opt for them in busy OPD schedule; rather go for measuring the arch-height in supine posture. Objectives: So this study was aimed to derive the regression between the standing arch-height values with the supine counterparts, so that former can be predicted easily from later. Methodology: It was carried out among 103 adult subjects in the purview of North Bengal Medical College & Hospital. From the x-ray films of their feet in supine and standing posture the navicular and talar heights were determined and the records were analysed. Result: Statistically significant correlation followed by regression analysis could reveal simple linear regression-equations for predicting the standing arch-height values from the supine values; derived separately in both males and females. Conclusion: Thus, from a known supine arch-height value, we can derive the respective standing arch- height, as well as the ‘standing normalised navicular height’ indirectly avoiding the entire troublesome maneuver in regular practice. So the present study recommends this method in clinical fields as because this is more rational and ideal approach to estimate arch height.

  9. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

    Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

    2015-05-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  10. Meta-regression Analysis of the Chinese Labor Reallocation Effect

    Longhua; YUE; Shiyan; YANG; Rongtai; SHEN

    2013-01-01

    Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect

  11. Finding determinants of audit delay by pooled OLS regression analysis

    Tina Vuko; Marko Čular

    2014-01-01

    The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days) from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011). We use pooled OLS regression analysis, mode...

  12. Multivariate study and regression analysis of gluten-free granola

    Lilian Maria Pagamunici

    2014-03-01

    Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.

  13. Determining Balıkesir’s Energy Potential Using a Regression Analysis Computer Program

    Bedri Yüksel

    2014-01-01

    Full Text Available Solar power and wind energy are used concurrently during specific periods, while at other times only the more efficient is used, and hybrid systems make this possible. When establishing a hybrid system, the extent to which these two energy sources support each other needs to be taken into account. This paper is a study of the effects of wind speed, insolation levels, and the meteorological parameters of temperature and humidity on the energy potential in Balıkesir, in the Marmara region of Turkey. The relationship between the parameters was studied using a multiple linear regression method. Using a designed-for-purpose computer program, two different regression equations were derived, with wind speed being the dependent variable in the first and insolation levels in the second. The regression equations yielded accurate results. The computer program allowed for the rapid calculation of different acceptance rates. The results of the statistical analysis proved the reliability of the equations. An estimate of identified meteorological parameters and unknown parameters could be produced with a specified precision by using the regression analysis method. The regression equations also worked for the evaluation of energy potential.

  14. Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

    Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe

  15. Multiple Time Scales and Longitudinal Measurements in Event History Analysis

    Danardono,

    2005-01-01

    A general time-to-event data analysis known as event history analysis is considered. The focus is on the analysis of time-to-event data using Cox's regression model when the time to the event may be measured from different origins giving several observable time scales and when longitudinal measurements are involved. For the multiple time scales problem, procedures to choose a basic time scale in Cox's regression model are proposed. The connections between piecewise constant hazards, time-depe...

  16. 城乡居民幸福感影响因素多重线性回归和路径分析%The Comparison between Multiple Linear Regression Analysis and Path Analysis on the Influencing Factors of Subjective Wellbeing among Urban and Rural Residents

    徐曼; 柴云; 李涛; 卢丽; 刘冰

    2015-01-01

    目的:以多重线性回归和路径分析深入探讨城乡居民主观幸福感影响因素及其相互作用关系。方法:应用简单随机抽样方法选取1480名城乡居民运用Campbell主观幸福感指数量表进行问卷调查,数据采用多重线性回归和路径分析进行对比分析。结果:城乡居民总体幸福感指数平均得分为(11.17±1.99)。多重线性回归分析显示,未来目标、压力应对方式、自评健康状况、兴趣爱好、城乡居住地、休闲时间是幸福感指数的预测因素,标准化偏回归数分别为0.261、0.182、0.152、0.066、0.071、0.051。路径分析发现,未来目标、压力应对方式、自评健康状况直接作用于幸福感指数,路径系数为0.285、0.191、0.160,兴趣爱好、个人月收入、受教育程度、年龄间接作用于幸福感指数,间接效应为0.08、-0.04、0.10、-0.07。结论:主观幸福感与个体生理、心理、社会因素等多个内外部因素有关,多重线性回归和路径分析在探讨居民主观幸福感影响因素及其作用关系过程中各有侧重,相互补充。%Objective:To further explore the subjective wellbeing among urban and rural residents and their influencing factors and the factors'interactions by multiple linear regression analyses and path analy-ses .Methods:By simple random sampling method ,a questionnaire survey was conducted among 1480 ur-ban and rural residents using Compbell subjective well -being index Scale ,and data were compared and analyzed by multiple linear regression analyses and path analyses .Results:The average of urban and rural residents'general well-being index was(11.17 ±1.99).Multiple linear regression analysis showed that future goals,stress coping styles,self-rated health status,hobbies,place from urban and rural residence and leisure time were predictors of well -being index ,and their standardized partial regression

  17. Artificial neural network and multiple regression model for nickel(II) adsorption on powdered activated carbons.

    Hema, M; Srinivasan, K

    2011-07-01

    Nickel removal efficiency of powered activated carbons of coconut oilcake, neem oilcake and commercial carbon was investigated by using artificial neural network. The effective parameters for the removal of nickel (%R) by adsorption process, which included the pH, contact time (T), distinctiveness of activated carbon (Cn), amount of activated carbon (Cw) and initial concentration of nickel (Co) were investigated. Levenberg-Marquardt (LM) Back-propagation algorithm is used to train the network. The network topology was optimized by varying number of hidden layer and number of neurons in hidden layer. The model was developed in terms of training; validation and testing of experimental data, the test subsets that each of them contains 60%, 20% and 20% of total experimental data, respectively. Multiple regression equation was developed for nickel adsorption system and the output was compared with both simulated and experimental outputs. Standard deviation (SD) with respect to experimental output was quite higher in the case of regression model when compared with ANN model. The obtained experimental data best fitted with the artificial neural network. PMID:23029923

  18. Spontaneous Regression of Multiple Pulmonary Metastases After Radiofrequency Ablation of a Single Metastasis

    We report two cases of spontaneous regression of multiple pulmonary metastases occurring after radiofrequency ablation (RFA) of a single lung metastasis. To the best of our knowledge, these are the first such cases reported. These two patients presented with lung metastases progressive despite treatment with interleukin-2, interferon, or sorafenib but were safely ablated with percutaneous RFA under computed tomography guidance. Percutaneous RFA allowed control of the targeted tumors for >1 year. Distant lung metastases presented an objective response despite the fact that they received no targeted local treatment. Local ablative techniques, such as RFA, induce the release of tumor-degradation product, which is probably responsible for an immunologic reaction that is able to produce a response in distant tumors.

  19. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  20. Semi-parametric Allelic Tests For Mapping Multiple Phenotypes: Binomial Regression And Mahalanobis Distance

    Majumdar, Arunabha; Witte, John S.; Ghosh, Saurabh

    2016-01-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors. Genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g. MultiPhen [O'Reilly et al., 2012], have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. We explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (BAMP), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a SNP (DAMP). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association are compared with the genotype-level test MultiPhen. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found substantially more powerful. All three tests are applied to two real data and the results offer some support for the simulation study. Since the allelic approaches assume Hardy-Weinberg Equilibrium (HWE), we propose a hybrid approach for testing multivariate association that implements MultiPhen when HWE is violated and BAMP otherwise. PMID:26493781

  1. Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

    Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m2, linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)

  2. Detection and parameter estimation for quantitative trait loci using regression models and multiple markers

    Da, Yang; VanRaden, Paul; Schook, Lawrence

    2000-01-01

    International audience A strategy of multi-step minimal conditional regression analysis has been developed to determine the existence of statistical testing and parameter estimation for a quantitative trait locus (QTL) that are unaffected by linked QTLs. The estimation of marker-QTL recombination frequency needs to consider only three cases: 1) the chromosome has only one QTL, 2) one side of the target QTL has one or more QTLs, and 3) either side of the target QTL has one or more QTLs. Ana...

  3. Regression analysis exploring teacher impact on student FCI post scores

    Mahadeo, Jonathan V.; Manthey, Seth R.; Brewe, Eric

    2013-01-01

    High School Modeling Workshops are designed to improve high school physics teachers' understanding of physics and how to teach using the Modeling method. The basic assumption is that the teacher plays a critical role in their students' physics education. This study investigated teacher impacts on students' Force Concept Inventory scores, (FCI), with the hopes of identifying quantitative differences between teachers. This study examined student FCI scores from 18 teachers with at least a year of teaching high school physics. This data was then evaluated using a General Linear Model (GLM), which allowed for a regression equation to be fitted to the data. This regression equation was used to predict student post FCI scores, based on: teacher ID, student pre FCI score, gender, and representation. The results show 12 out of 18 teachers significantly impact their student post FCI scores. The GLM further revealed that of the 12 teachers only five have a positive impact on student post FCI scores. Given these differences among teachers it is our intention to extend our analysis to investigate pedagogical differences between them.

  4. Identifying Population Groups with Low Palliative Care Program Enrolment Using Classification and Regression Tree Analysis

    Gao, Jun; Lavergne, M. Ruth; McIntyre, Paul

    2013-01-01

    Classification and regression tree (CART) analysis was used to identify subpopulations with lower palliative care program (PCP) enrolment rates. CART analysis uses recursive partitioning to group predictors. The PCP enrolment rate was 72 percent for the 6,892 adults who died of cancer from 2000 and 2005 in two counties in Nova Scotia, Canada. The lowest PCP enrolment rates were for nursing home residents over 82 years (27 percent), a group residing more than 43 kilometres from the PCP (31 percent), and another group living less than two weeks after their cancer diagnosis (37 percent). The highest rate (86 percent) was for the 2,118 persons who received palliative radiation. Findings from multiple logistic regression (MLR) were provided for comparison. CART findings identified low PCP enrolment subpopulations that were defined by interactions among demographic, social, medical, and health system predictors. PMID:21805944

  5. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    Nielsen, Allan Aasbjerg

    2007-01-01

    estimates of relevant parameters in an over-determined system of equations which may arise from deliberately carrying out more measurements than actually needed to determine the set of desired parameters. An example may be the determination of a geographical position based on information from a number of...... Global Navigation Satellite System (GNSS) satellites also known as space vehicles (SV). It takes at least four SVs to determine the position (and the clock error) of a GNSS receiver. Often more than four SVs are used and we use adjustment to obtain a better estimate of the geographical position (and the...... different variables in an experiment or in a survey, etc. Regression analysis is probably one the most used statistical techniques around. Dr. Anna B. O. Jensen provided insight and data for the Global Positioning System (GPS) example. Matlab code and sections that are considered as either traditional land...

  6. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  7. Finding determinants of audit delay by pooled OLS regression analysis

    Tina Vuko

    2014-03-01

    Full Text Available The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011. We use pooled OLS regression analysis, modelling audit delay as a function of the following explanatory variables: audit firm type, audit opinion, profitability, leverage, inventory and receivables to total assets, absolute value of total accruals, company size and audit committee existence. Our results indicate that audit committee existence, profitability and leverage are statistically significant determinants of audit delay in Croatia.

  8. Logistic regression analysis on the risk factors of radiation pneumonitis

    Objective: To identify the risk factors of radiation pneumonitis (RP). Methods: A retrospective study was conducted on 101 patients with radiation pneumonitis using SPSS 8.0 software. Factors evaluated included: gender, age, pathology, clinical stage, irradiation dose, irradiation field size, history of smoking, cardiovascular disease, bronchitis, surgery, chemotherapy, lung infection, atelectasis, obstructive infection and pleural effusion. Univariate analysis was performed using Chi-Square test and multivariate analysis was performed using Logistic regression model. Results: Univariate analysis revealed a significant relationship between 10 factors: pulmonary infection, atelectasis, obstructive infection, cardiovascular disease, bronchitis, chemotherapy, irradiation dose, number of days of radiation and irradiation field size were factors leading to radiation pneumonitis. Multivariate analysis showed that 9 factors: pulmonary infection, obs tractive infection, atelectasis, pleural effusion, bronchitis, cardiovascular disease, chemotherapy, irradiation dose, and irradiation field size were independent factors. Conclusion: Comprehensive consideration of the accompanying disease, chemotherapy, dose, field size, etc during the planning of radiotherapy is able to minimize the possibility of developing radiation pneumonitis

  9. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  10. CARAT-GxG: CUDA-Accelerated Regression Analysis Toolkit for Large-Scale Gene–Gene Interaction with GPU Computing System

    Sungyoung Lee; Min-Seok Kwon; Taesung Park

    2014-01-01

    In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene–gene interaction (GGI) in large-scale G...

  11. Analysis of multiple source/multiple informant data in Stata

    Nicholas Horton; Garrett Fitzmaurice

    2005-01-01

    We describe regression-based methods for analyzing multiple-source data arising from complex sample survey designs in Stata. We use the term multiple-source data to encompass all cases where data are simultaneously obtained from multiple informants, or raters (e.g., self-reports, family members, health care providers, administrators) or via different/parallel instruments, indicators or methods (e.g., symptom rating scales, standardized diagnostic interviews, or clinical diagnoses). We review ...

  12. Low-Cost Housing in Sabah, Malaysia: A Regression Analysis

    Dullah Mulok

    2009-02-01

    Full Text Available Low-cost housing plays a vital role in the development process especially in providing accommodation to those who are less fortunate and the lower income group. This effort is also a step in overcoming the squatter problem which could cripple the competitive drive of the local community especially in the state of Sabah, Malaysia. This article attempts to look into the influencing factors to low-cost housing in Sabah namely the government’s budget (allocation for low cost housing projects and Sabah’s total population. At the same time, this study will attempt to show the implication from the development and economic crises which occurred during period 1971 to 2000 towards the provision of low cost houses in Sabah. Empirical analyses were conducted using the multiple linear regression method, stepwise and also the dummy variable approach in demonstrating the link. The empirical result shows that the government’s budget for low-cost housing is the main contributor to the provision of low-cost housing in Sabah. The empirical decision also suggests that economic growth namely Gross Domestic Product (GDP did not provide a significant effect to the low-cost housing in Sabah. However, almost all major crises that have beset upon Malaysia’s economy caused a significant and consistent effect to the low-cost housing in Sabah especially the financial crisis which occurred in mid 1997.

  13. Spatial regression analysis of traffic crashes in Seoul.

    Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-Ihn; Ulfarsson, Gudmundur F

    2016-06-01

    Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands. PMID:26994374

  14. A New Measurement Equivalence Technique Based on Latent Class Regression as Compared with Multiple Indicators Multiple Causes

    Jamali, Jamshid; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

    2016-01-01

    Background: Measurement equivalence is an essential prerequisite for making valid comparisons in mental health questionnaires across groups. In most methods used for assessing measurement equivalence, which is known as Differential Item Functioning (DIF), latent variables are assumed to be continuous. Objective: To compare a new method called Latent Class Regression (LCR) designed for discrete latent variable with the multiple indicators multiple cause (MIMIC) as a continuous latent variable technique to assess the measurement equivalence of the 12-item General Health Questionnaire (GHQ-12), which is a cross deferent subgroup of Iranian nurses. Methods: A cross-sectional survey was conducted in 2014 among 771 nurses working in the hospitals of Fars and Bushehr provinces of southern Iran. To identify the Minor Psychiatric Disorders (MPD), the nurses completed self-report GHQ-12 questionnaires and sociodemographic questions. Two uniform-DIF detection methods, LCR and MIMIC, were applied for comparability when the GHQ-12 score was assumed to be discrete and continuous, respectively. Results: The result of fitting LCR with 2 classes indicated that 27.4% of the nurses had MPD. Gender was identified as an influential factor of the level of MPD.LCR and MIMIC agree with detection of DIF and DIF-free items by gender, age, education and marital status in 83.3, 100.0, 91.7 and 83.3% cases, respectively. Conclusions: The results indicated that the GHQ-12 is to a great degree, an invariant measure for the assessment of MPD among nurses. High convergence between the two methods suggests using the LCR approach in cases of discrete latent variable, e.g. GHQ-12 and adequate sample size. PMID:27482129

  15. An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis

    Wen-Tsao Pan

    2016-01-01

    Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.

  16. The role of multiple regression and exploratory data analysis in the development of leukemia incidence risk models for comparison of radionuclide air stack emissions from nuclear and coal power industries

    Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)

  17. Linear Regression Analysis With Missing Observations Among The Independent Variables in Animal Breeding

    Kayaalp, G.Tamer

    1999-01-01

    In animal breeding, when there is a relationship between the dependent (Y) and independent (X) variables, regression analysis is applied. But when one of the variables has one or more missing observations regression analysis cannot be applied. This paper illustrates and discusses a regression analysis in which the independent variable (X) has a missing observation.

  18. Multiple linear regression models for shear strength prediction and design of simplysupported deep beams subjected to symmetrical point loads

    Panatchai Chetchotisak

    2015-09-01

    Full Text Available Because of nonlinear strain distributions caused either by abrupt changes in geometry or in loading in deep beam, the approach for conventional beams is not applicable. Consequently, strut-and-tie model (STM has been applied as the most rational and simple method for strength prediction and design of reinforced concrete deep beams. A deep beam is idealized by the STM as a truss-like structure consisting of diagonal concrete struts and tension ties. There have been numerous works proposing the STMs for deep beams. However, uncertainty and complexity in shear strength computations of deep beams can be found in some STMs. Therefore, improvement of methods for predicting the shear strengths of deep beams are still needed. By means of a large experimental database of 406 deep beam test results covering a wide range of influencing parameters, several shapes and geometry of STM and six state-of-the-art formulation of the efficiency factors found in the design codes and literature, the new STMs for predicting the shear strength of simply supported reinforced concrete deep beams using multiple linear regression analysis is proposed in this paper. Furthermore, the regression diagnostics and the validation process are included in this study. Finally, two numerical examples are also provided for illustration.

  19. Optimization of end-members used in multiple linear regression geochemical mixing models

    Dunlea, Ann G.; Murray, Richard W.

    2015-11-01

    Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).

  20. Dental malocclusion and body posture in young subjects: A multiple regression study

    Giuseppe Perinetti

    2010-01-01

    Full Text Available OBJECTIVES: Controversial results have been reported on potential correlations between the stomatognathic system and body posture. We investigated whether malocclusal traits correlate with body posture alterations in young subjects to determine possible clinical applications. METHODS: A total of 122 subjects, including 86 males and 36 females (age range of 10.8-16.3 years, were enrolled. All subjects tested negative for temporomandibular disorders or other conditions affecting the stomatognathic systems, except malocclusion. A dental occlusion assessment included phase of dentition, molar class, overjet, overbite, anterior and posterior crossbite, scissorbite, mandibular crowding and dental midline deviation. In addition, body posture was recorded through static posturography using a vertical force platform. Recordings were performed under two conditions, namely, i mandibular rest position (RP and ii dental intercuspidal position (ICP. Posturographic parameters included the projected sway area and velocity and the antero-posterior and right-left load differences. Multiple regression models were run for both recording conditions to evaluate associations between each malocclusal trait and posturographic parameters. RESULTS: All of the posturographic parameters had large variability and were very similar between the two recording conditions. Moreover, a limited number of weakly significant correlations were observed, mainly for overbite and dentition phase, when using multivariate models. CONCLUSION: Our current findings, particularly with regard to the use of posturography as a diagnostic aid for subjects affected by dental malocclusion, do not support existence of clinically relevant correlations between malocclusal traits and body posture

  1. Framing an Nuclear Emergency Plan using Qualitative Regression Analysis

    Since the arising on safety maintenance issues due to post-Fukushima disaster, as well as, lack of literatures on disaster scenario investigation and theory development. This study is dealing with the initiation difficulty on the research purpose which is related to content and problem setting of the phenomenon. Therefore, the research design of this study refers to inductive approach which is interpreted and codified qualitatively according to primary findings and written reports. These data need to be classified inductively into thematic analysis as to develop conceptual framework related to several theoretical lenses. Moreover, the framing of the expected framework of the respective emergency plan as the improvised business process models are abundant of unstructured data abstraction and simplification. The structural methods of Qualitative Regression Analysis (QRA) and Work System snapshot applied to form the data into the proposed model conceptualization using rigorous analyses. These methods were helpful in organising and summarizing the snapshot into an 'as-is' work system that being recommended as 'to-be'work system towards business process modelling. We conclude that these methods are useful to develop comprehensive and structured research framework for future enhancement in business process simulation. (author)

  2. Analysis of retirement income adequacy using quantile regression: A case study in Malaysia

    Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi

    2015-09-01

    Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.

  3. Analysis of some methods for reduced rank Gaussian process regression

    Quinonero-Candela, J.; Rasmussen, Carl Edward

    2005-01-01

    proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...

  4. A study on driving forces of star hotel's spatial distribution in Nanchang city based on multiple stepwise regression analysis and path analysis%基于多元逐步回归与通径分析的南昌市城市星级饭店布局驱动力研究

    袁小红; 于之锋; 毛端谦

    2011-01-01

    Hotels in cities has entered a new period of development in recent years, the star-rated hotels has played a positive lead and exemplary roles for the hospitality industry, even the whole tourism. Based on the research of the pioneers, using the 1990-2007 relevant statistical data, such as economy and transportation, the paper analyses the key driving force from 12 factors that influence the spatial distribution of star-rated hotels, by Multiple stepwise Regression Analysis and Path Analysis. Through analysis, it can be concluded that the city traffic, the income from tourism and the Urban Air Quality are the main driving force, which take important role in the spatial distribution of the star-rated hotels in Nanchang city. Lastly, the more and more better investment environments and economic development policy, it will influence the redistribution.%近年来,城市饭店进入了一个蓬勃发展的新时期,星级饭店作为旅游饭店业的主体部分,对饭店业乃至旅游业的发展起到积极引领和示范作用.在前人的研究基础上,采用1990年~2007年南昌市市辖区有关方面的统计数据,对影响星级饭店规模布局的12个因子进行多元线性逐步回归及通径分析,最终确定市内交通状况、旅游业收入及城市空气质量为影响南昌市星级饭店空间布局的主要驱动因子.另外,良好的投资环境与经济发展政策将在一定程度上影响南昌市星级饭店再分布.

  5. Design and analysis of experiments classical and regression approaches with SAS

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  6. Comparison of hyperbolic and constant width simultaneous confidence bands in multiple linear regression under MVCS criterion

    W. Liu; Hayter, A. J.; Piegorsch, W W; Ah-Kine, P.

    2008-01-01

    A simultaneous confidence band provides useful information on the plausible range of the unknown regression model, and different confidence bands can often be constructed for the same regression model. For a simple regression line, Liu and Hayter (2007) propose use of the area of the confidence set corresponding to a confidence band as an optimality criterion in comparison of confidence bands; the smaller the area of the confidence set, the better the corresponding confidence band. This minim...

  7. Comparison of Hyperbolic and Constant Width Simultaneous Confidence Bands in Multiple Linear Regression under MVCS Criterion

    W. Liu; Hayter, A. J.; Piegorsch, W W

    2009-01-01

    A simultaneous confidence band provides useful information on the plausible range of the unknown regression model, and different confidence bands can often be constructed for the same regression model. For a simple regression line, it is proposed in Liu and Hayter (2007) to use the area of the confidence set that corresponds to a confidence band as an optimality criterion in comparison of confidence bands; the smaller is the area of the confidence set, the better is the corresponding confiden...

  8. A flexible count data regression model for risk analysis.

    Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P

    2008-02-01

    In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets. PMID:18304118

  9. Power analysis of principal components regression in genetic association studies

    Yan-feng SHEN; Jun ZHU

    2009-01-01

    Association analysis provides an opportunity to find genetic variants underlying complex traits. A principal com-ponents regression (PCR)-based approach was shown to outperform some competing approaches. However, a limitation of this method is that the principal components (PCs) selected from single nucleotide polymorphisms (SNPs) may be unrelated to the phenotype. In this article, we investigate the theoretical properties of such a method in more detail. We first derive the exact power function of the test based on PCR, and hence clarify the relationship between the test power and the degrees of freedom (DF). Next, we extend the PCR test to a general weighted PCs test, which provides a unified framework for understanding the properties of some related statistics. We then compare the performance of these tests. We also introduce several data-driven adaptive alterna-tives to overcome difficulties in the PCR approach. Finally, we illustrate our results using simulations based on real genotype data. Simulation study shows the risk of using the unsupervised rule to determine the number of PCs, and demonstrates that there is no single uniformly powerful method for detecting genetic variants.

  10. Simreg: a Software Including Some New Developments in Multiple Comparison and Simultaneous Confidence Bands for Linear Regression Models

    Mortaza Jamshidian; Wei Liu; Ying Zhang; Farid Jamishidian

    2005-01-01

    The problem of simultaneous inference and multiple comparison for comparing means of k( ≥ 3) populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software fo...

  11. Determinants of success of coronary angioplasty in patients with a chronic total occlusion: a multiple logistic regression model to improve selection of patients.

    Tan, K H; Sulke, N.; Taub, N A; Watts, E.; Karani, S.; Sowton, E

    1993-01-01

    OBJECTIVE--To study the determinants of success of coronary angioplasty in patients with chronic total occlusions, and to formulate a multiple logistic regression model to improve selection of patients. DESIGN--A retrospective analysis of clinical and angiographic data on a consecutive series of patients. PATIENTS--312 patients (mean age 55, range 31 to 79 years, 86% men) who underwent coronary angioplasty procedure for a chronic total occlusion between 1981 and 1992. RESULTS--Procedural succ...

  12. Use of a neural network and a multiple regression model to predict histologic grade of astrocytoma from MRI appearances

    Several MRI features of supratentorial astrocytomas are associated with high histologic grade by statistically significant p values. We sought to apply this information prospectively to a group of astrocytomas in the prediction of tumor grade. We used 10 MRI features of fibrillary astrocytomas from 52 patient studies to develop neural network and multiple linear regression models for practical use in predicting tumor grade. The models were tested prospectively on MR images from 29 patient studies. The performance of the models was compared against that of a radiologist. Neural network accuracy was 61 % in distinguishing between low and high grade tumors. Multiple linear regression achieved an accuracy of 59 %. Assessment of the images by a radiologist yielded 57 % accuracy. We conclude that while certain MRI parameters may be statistically related to astrocytoma histologic grade, neural network and linear regression models cannot reliably use them to predict tumor grade. (orig.)

  13. Estimating the Coefficient of Cross-validity in Multiple Regression: A Comparison of Analytical and Empirical Methods.

    Kromrey, Jeffrey D.; Hines, Constance V.

    1996-01-01

    The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)

  14. Multiple linear regression analysis of the X-ray measurement and WOMAC, KUJALA, MELBOURNE scores of patellofemoral pain syndrome%髌股疼痛综合征X线特征与三种评分系统的多元线性回归分析

    薛刚; 朱庆生; 朱锦宇; 姜炜

    2013-01-01

    目的 采用X线测量发生髌股疼痛综合征(PFPS)膝关节的相关影像学参数,并分别与WOMAC、KUJALA和MEL-BOURNE评分系统进行多元线性回归分析.方法 筛选出49例(51膝)膝关节选取和PFPS相关的10项参数进行测量:股骨远端外翻角(DFVA,X1)、胫骨近端内翻角(PTVA,X2)、股骨角(FA,X3)、胫骨角(TA,X4)、胫股角(TFA,X5)、Insall-Salvati指数(ISR,X6)、沟角(SA,X7)、外侧髌骨角(LPA,X8)、适合角(CA,X9)、髌股指数(PI,X10),并进行WOMAC、KUJALA和MELBOURNE评分,应用多元线性回归方程分析影像学参数与评分之间的相关性.结果 3组多元线性回归方程均有统计学意义(P<0.05),WOMAC评分多元回归方程:Y=-213.742+2.011 X5,F=3.960,R2 =0.494;KUJALA评分多元回归方程:Y=125.835-24.475 X6-0.341 X7-0.992Xs,F=32.732,R2=0.891;MELBOURNE评分多元回归方程:Y=51.66-16.329X6-5.47X10,F =22.178,R2=0.856.结论 ①膝关节X线测量数据在一定程度上反映3项评分及膝关节功能的情况;②KUJALA评分能较全面地评估PFPS,轴位X线片上Insall-Salvati指数、沟角、外侧髌股角较为重要,可用于临床评估PFPS患者在治疗前后的功能恢复情况;③由于KUJALA和MELBOURNE评分的决定系数较大,回归系数标准误较小,从而在临床上通过统计控制确定评分值来评估影像学参数.%Objective To perform multiple linear regression analysis of X ray measurement and WOMAC,KUJALA and MELBOURNE scores of patellofemoral pain syndrome (PFPS) knee joints.Methods A total of 49 patients (51 knees) were reviewed according to inclusion and exclusion criteria.10 parameters were chosen including distal femoral valgus angle (DFVA,X1),proximal tibial varus angle (PTVA,X2),femoral angle (FA,X3),tibia angle (TA,X4),tibiofemoral angle(TFA,X5),Insall-Salvati ratio (ISR,X6),sulcus angle (SA,X7),lateral patellofemoral angle (LPA,X8),congruence angle (CA,X9) and patellofemoral index(PI,X10) which all were related to patellofemoral

  15. Regression analysis of technical parameters affecting nuclear power plant performances

    Since the 80's many studies have been conducted in order to explicate good and bad performances of commercial nuclear power plants (NPPs), but yet no defined correlation has been found out to be totally representative of plant operational experience. In early works, data availability and the number of operating power stations were both limited; therefore, results showed that specific technical characteristics of NPPs were supposed to be the main causal factors for successful plant operation. Although these aspects keep on assuming a significant role, later studies and observations showed that other factors concerning management and organization of the plant could instead be predominant comparing utilities operational and economic results. Utility quality, in a word, can be used to summarize all the managerial and operational aspects that seem to be effective in determining plant performance. In this paper operational data of a consistent sample of commercial nuclear power stations, out of the total 433 operating NPPs, are analyzed, mainly focusing on the last decade operational experience. The sample consists of PWR and BWR technology, operated by utilities located in different countries, including U.S. (Japan)) (France)) (Germany)) and Finland. Multivariate regression is performed using Unit Capability Factor (UCF) as the dependent variable; this factor reflects indeed the effectiveness of plant programs and practices in maximizing the available electrical generation and consequently provides an overall indication of how well plants are operated and maintained. Aspects that may not be real causal factors but which can have a consistent impact on the UCF, as technology design, supplier, size and age, are included in the analysis as independent variables. (authors)

  16. INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD

    Eglantina HYSA

    2012-06-01

    Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.

  17. Analysis of pulsed eddy current data using regression models for steam generator tube support structure inspection

    Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.

    2016-02-01

    Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.

  18. Driven Factors Analysis of China’s Irrigation Water Use Efficiency by Stepwise Regression and Principal Component Analysis

    Renfu Jia

    2016-01-01

    Full Text Available This paper introduces an integrated approach to find out the major factors influencing efficiency of irrigation water use in China. It combines multiple stepwise regression (MSR and principal component analysis (PCA to obtain more realistic results. In real world case studies, classical linear regression model often involves too many explanatory variables and the linear correlation issue among variables cannot be eliminated. Linearly correlated variables will cause the invalidity of the factor analysis results. To overcome this issue and reduce the number of the variables, PCA technique has been used combining with MSR. As such, the irrigation water use status in China was analyzed to find out the five major factors that have significant impacts on irrigation water use efficiency. To illustrate the performance of the proposed approach, the calculation based on real data was conducted and the results were shown in this paper.

  19. Improved performance of a two-element TLD badge for determining gamma and beta doses using multiple linear regression

    The gamma/beta TLD badge used by OPPD consists of two TLD-700 chips (Harshaw G7 card), one of which (chip number sign 2) is shielded by a 0.102 cm-thick aluminum filter, and the other (chip number sign 1) is unshielded, as shown in Fig. 1. Standard procedure had been to determine the beta dose to the badge by subtracting the response of chip number sign 2 from that of chip number sign 1 and then dividing by a calibrated beta-sensitivity factor; the gamma dose was taken to be the response of chip number sign 2 divided by the chip's gamma-sensitivity factor followed by the subtraction of the background dose. A problem with this procedure is penetration of energetic beta particles through the aluminum filter on chip number sign 2 which causes an over-response. Due to the technique used to obtain the beta dose, this also results in an under-estimate of the beta dose. This problem has been corrected through application of multiple linear regression analysis on a large data base of pure gamma (137Cs), pure beta (90Sr), and mixed exposures. The outcome of the analysis is an algorithm that automatically corrects for penetration effects. Performance tests using the ANSI N13.11 standard are presented to show the improvement

  20. Quantile regression provides a fuller analysis of speed data.

    Hewson, Paul

    2008-03-01

    Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated. PMID:18329400

  1. Buffalos milk yield analysis using random regression models

    A.S. Schierholt

    2010-02-01

    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  2. The use of regression analysis for resource allocation by central government

    Goldstein, H

    1994-01-01

    The use of multiple regression models to determine weights for the prediction of UK local authority spending targets is criticised. It is argued that predictions based upon such regression weights are not valid estimates of 'need to spend' and in particular that they should not be used as the basis for decisions about 'capping'.

  3. Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

    Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line;

    Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk in...... multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis...

  4. Research on the High Star Hotel Staff Satisfaction Based on Multiple Regression Analysis:Taking the High Star Hotels in Changsha for Example%基于多元回归分析的高星级酒店员工满意度研究--以长沙市高星级酒店为例

    王华丽

    2014-01-01

    The hotel staff satisfaction has been watched keenly by the hotel industry and the academia. In this paper, through investigation to the high star hotels in Changsha, the basic data are obtained and multiple regression analysis is used to study the influencing factors of hotel staff satisfaction. The results indicate that promotion prospect has the largest impact on employee satisfaction, followed by compensation, and the influence of work itself is not significant in statistical sense.%酒店员工满意度问题一直受到业界和学界的普遍关注。本文通过对长沙市高星级酒店进行调查,获得基础数据,采用多元回归分析研究酒店员工满意度的影响因素,研究结果发现:晋升机会对员工满意度的影响最大,其次是薪酬,而工作本身对员工满意度的影响在统计意义上并不显著。

  5. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Račić, Željko V.; Baraković, Biljana T.

    2010-01-01

    This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial) crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand) and the credit cards, i.e. indebtedness of the population on this basis (on the other hand), in the USA (from 1999. to 2008). We used the extended application model which shows how the analyst should run the whole development process of regression model. ...

  6. External Tank Liquid Hydrogen (LH2) Prepress Regression Analysis Independent Review Technical Consultation Report

    Parsons, Vickie s.

    2009-01-01

    The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.

  7. Regression Analysis with Block Missing Values and Variables Selection

    Chien-Pai Han

    2011-07-01

    Full Text Available We consider a regression model when a block of observations is missing, i.e. there are a group of observations with all the explanatory variables or covariates observed and another set of observations with only a block of the variables observed. We propose an estimator of the regression coefficients that is a combination of two estimators, one based on the observations with no missing variables, and the other the set all observations after deleting of the block of variables with missing values. The proposed combined estimator will be compared with the uncombined estimators. If the experimenter suspects that the variables with missing values may be deleted, a preliminary test will be performed to resolve the uncertainty. If the preliminary test of the null hypothesis that regression coefficients of the variables with missing value equal to zero is accepted, then only the data with no missing values are used for estimating the regression coefficients. Otherwise the combined estimator is used. This gives a preliminary test estimator. The properties of the preliminary test estimator and comparisons of the estimators are studied by a Monte Carlo study

  8. Regression analysis of censored data using pseudo-observations

    Parner, Erik T.; Andersen, Per Kragh

    2010-01-01

    We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been...

  9. Grades, Gender, and Encouragement: A Regression Discontinuity Analysis

    Owen, Ann L.

    2010-01-01

    The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…

  10. An application with multinomial logistic regression analysisMultinomiyal logistik regresyon analizi ile bir uygulama

    Sadi Elasan; Sıddık Keskin

    2015-01-01

    Multinomial logistic regression analysis is one of the analysis techniques which is used to examine relationships between independent and dependent variables when dependent variable including three or more category. In multinomial logistic regression analysis, any category of dependent variable is considered as reference category and other categories are analyzed with respect to this category. In this study “Multinomial Logistic Regression Analysis” was introduced and an application was done....

  11. Network-based multiple locus linkage analysis of expression traits

    Pan, Wei

    2009-01-01

    Motivation: We consider the problem of multiple locus linkage analysis for expression traits of genes in a pathway or a network. To capitalize on co-expression of functionally related genes, we propose a penalized regression method that maps multiple expression quantitative trait loci (eQTLs) for all related genes simultaneously while accounting for their shared functions as specified a priori by a gene pathway or network.

  12. Variable selection and regression analysis for the prediction of mortality rates associated with foodborne diseases.

    Amene, E; Hanson, L A; Zahn, E A; Wild, S R; Döpfer, D

    2016-07-01

    The purpose of this study was to apply a novel statistical method for variable selection and a model-based approach for filling data gaps in mortality rates associated with foodborne diseases using the WHO Vital Registration mortality dataset. Correlation analysis and elastic net regularization methods were applied to drop redundant variables and to select the most meaningful subset of predictors. Whenever predictor data were missing, multiple imputation was used to fill in plausible values. Cluster analysis was applied to identify similar groups of countries based on the values of the predictors. Finally, a Bayesian hierarchical regression model was fit to the final dataset for predicting mortality rates. From 113 potential predictors, 32 were retained after correlation analysis. Out of these 32 predictors, eight with non-zero coefficients were selected using the elastic net regularization method. Based on the values of these variables, four clusters of countries were identified. The uncertainty of predictions was large for countries within clusters lacking mortality rates, and it was low for a cluster that had mortality rate information. Our results demonstrated that, using Bayesian hierarchical regression models, a data-driven clustering of countries and a meaningful subset of predictors can be used to fill data gaps in foodborne disease mortality. PMID:26785774

  13. Varying-coefficient functional linear regression

    Wu, Yichao; Fan, Jianqing; Müller, Hans-Georg

    2010-01-01

    Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regression model. We study...

  14. REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL

    Siana Halim

    2007-01-01

    Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.

  15. Retirement patterns in Hong Kong: A censored regression analysis

    Wing Suen

    1997-01-01

    This paper provides an overview of retirement patterns in Hong Kong on the basis of limited data. A censored regression model is used to infer the retirement age from people`s current retirement status and their current age. This model is equivalent to a restricted probit model, and the interpretation of parameters is straightforward. The results clearly show a negative income effect on the retirement decision. The retirement age seems to be positively related to lifetime earnings but negativ...

  16. Model performance analysis and model validation in logistic regression

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  17. Hermite Regression Analysis of Multi-Modal Count Data

    David E. Giles

    2010-01-01

    We discuss the modeling of count data whose empirical distribution is both multi-modal and over-dispersed, and propose the Hermite distribution with covariates introduced through the conditional mean. The model is readily estimated by maximum likelihood, and nests the Poisson model as a special case. The Hermite regression model is applied to data for the number of banking and currency crises in IMF-member countries, and is found to out-perform the Poisson and negative binomial models.

  18. multiple test procedures and the closure principle: a new look at multiple hypotheses testing in the linear regression model

    Alt, Raimund

    1991-01-01

    summary: in this paper we show how to apply the closure test principle in case of testing linear hypotheses within the classical linear regression model. the closure test principle which was introduced by marcus/peritz/gabriel (1976) results in the construction of test procedures which are in general much more powerful than conventional test procedures like the bonferroni procedure or the scheffe procedure. a small simulation study provides some evidence of the superiority of closed test proc...

  19. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  20. Prediction of the Rock Mass Diggability Index by Using Fuzzy Clustering-Based, ANN and Multiple Regression Methods

    Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad

    2014-03-01

    Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.

  1. An evaluation of logic regression-based biomarker discovery across multiple intergenic regions for predicting host specificity in Escherichia coli.

    Zhi, Shuai; Li, Qiaozhi; Yasui, Yutaka; Banting, Graham; Edge, Thomas A; Topp, Edward; McAllister, Tim A; Neumann, Norman F

    2016-10-01

    Several studies have demonstrated that E. coli appears to display some level of host adaptation and specificity. Recent studies in our laboratory support these findings as determined by logic regression modeling of single nucleotide polymorphisms (SNP) in intergenic regions (ITGRs). We sought to determine the degree of host-specific information encoded in various ITGRs across a library of animal E. coli isolates using both whole genome analysis and a targeted ITGR sequencing approach. Our findings demonstrated that ITGRs across the genome encode various degrees of host-specific information. Incorporating multiple ITGRs (i.e., concatenation) into logic regression model building resulted in greater host-specificity and sensitivity outcomes in biomarkers, but the overall level of polymorphism in an ITGR did not correlate with the degree of host-specificity encoded in the ITGR. This suggests that distinct SNPs in ITGRs may be more important in defining host-specificity than overall sequence variation, explaining why traditional unsupervised learning phylogenetic approaches may be less informative in terms of revealing host-specific information encoded in DNA sequence. In silico analysis of 80 candidate ITGRs from publically available E. coli genomes was performed as a tool for discovering highly host-specific ITGRs. In one ITGR (ydeR-yedS) we identified a SNP biomarker that was 98% specific for cattle and for which 92% of all E. coli isolates originating from cattle carried this unique biomarker. In the case of humans, a host-specific biomarker (98% specificity) was identified in the concatenated ITGR sequences of rcsD-ompC, ydeR-yedS, and rclR-ykgE, and for which 78% of E. coli originating from humans carried this biomarker. Interestingly, human-specific biomarkers were dominant in ITGRs regulating antibiotic resistance, whereas in cattle host-specific biomarkers were found in ITGRs involved in stress regulation. These data suggest that evolution towards host

  2. Characterization of engineered cartilage constructs using multiexponential T₂ relaxation analysis and support vector regression.

    Irrechukwu, Onyi N; Reiter, David A; Lin, Ping-Chang; Roque, Remigio A; Fishbein, Kenneth W; Spencer, Richard G

    2012-06-01

    Increased sensitivity in the characterization of cartilage matrix status by magnetic resonance (MR) imaging, through the identification of surrogate markers for tissue quality, would be of great use in the noninvasive evaluation of engineered cartilage. Recent advances in MR evaluation of cartilage include multiexponential and multiparametric analysis, which we now extend to engineered cartilage. We studied constructs which developed from chondrocytes seeded in collagen hydrogels. MR measurements of transverse relaxation times were performed on samples after 1, 2, 3, and 4 weeks of development. Corresponding biochemical measurements of sulfated glycosaminoglycan (sGAG) were also performed. sGAG per wet weight increased from 7.74±1.34 μg/mg in week 1 to 21.06±4.14 μg/mg in week 4. Using multiexponential T₂ analysis, we detected at least three distinct water compartments, with T₂ values and weight fractions of (45 ms, 3%), (200 ms, 4%), and (500 ms, 97%), respectively. These values are consistent with known properties of engineered cartilage and previous studies of native cartilage. Correlations between sGAG and MR measurements were examined using conventional univariate analysis with T₂ data from monoexponential fits with individual multiexponential compartment fractions and sums of these fractions, through multiple linear regression based on linear combinations of fractions, and, finally, with multivariate analysis using the support vector regression (SVR) formalism. The phenomenological relationship between T₂ from monoexponential fitting and sGAG exhibited a correlation coefficient of r²=0.56, comparable to the more physically motivated correlations between individual fractions or sums of fractions and sGAG; the correlation based on the sum of the two proteoglycan-associated fractions was r²=0.58. Correlations between measured sGAG and those calculated using standard linear regression were more modest, with r² in the range 0

  3. Partial least squares regression can aid in detecting differential abundance of multiple features in sets of metagenomic samples

    Ondrej eLibiger

    2015-12-01

    Full Text Available It is now feasible to examine the composition and diversity of microbial communities (i.e., `microbiomes‘ that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology 'Metastats‘ across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency

  4. Arch Height: A Regression Analysis of Different Measuring Parameters

    Hironmoy Roy; Kalyan Bhattacharya; Asit Chandra Roy; Samar Deb; Kuntala Ray

    2011-01-01

    Rationale: For measuring the height of the arch of foot either standing navicular height or talar height of the medial longitudinal arch was accepted in earlier days, where as the ‘standing normalised navicular height’ is taken by modern day by authors as a yardstick. But being troublesome and time consuming, we practically not opt for them in busy OPD schedule; rather go for measuring the arch-height in supine posture. Objectives: So this study was aimed to derive the regression between the...

  5. Multi-stratified multiple regression tests of the linear/no-threshold theory of radon-induced lung cancer

    A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy

  6. Digital soil mapping using multiple logistic regression on terrain parameters in southern Brazil Mapeamento digital de solos utilizando regressões logísticas múltiplas e parâmetros do terreno no sul do Brasil

    Elvio Giasson; Robin Thomas Clarke; Alberto Vasconcellos Inda Junior; Gustavo Henrique Merten; Carlos Gustavo Tornquist

    2006-01-01

    Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory...

  7. Analysis of some methods for reduced rank Gaussian process regression

    Quinonero-Candela, J.; Rasmussen, Carl Edward

    2005-01-01

    While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent pro...... the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.......While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating...

  8. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit; Haglind, Fredrik

    2014-01-01

    Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to deter...

  9. Additive Intensity Regression Models in Corporate Default Analysis

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo;

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  10. Estimation of streamflow, base flow, and nitrate-nitrogen loads in Iowa using multiple linear regression models

    Schilling, K.E.; Wolter, C.F.

    2005-01-01

    Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).

  11. Permutation-Based Adjustments for the Significance of Partial Regression Coefficients in Microarray Data Analysis

    Wagner, Brandie D.; Zerbe, Gary O; Mexal, Sharon; Leonard, Sherry S.

    2008-01-01

    The aim of this paper is to generalize permutation methods for multiple testing adjustment of significant partial regression coefficients in a linear regression model used for microarray data. Using a permutation method outlined by Anderson and Legendre [1999] and the permutation P-value adjustment from Simon et al. [2004], the significance of disease related gene expression will be determined and adjusted after accounting for the effects of covariates, which are not restricted to be categori...

  12. Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization

    MUHIB ALI RAJPER

    2016-07-01

    Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.

  13. pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

    Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

    2016-07-01

    Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. PMID:27218266

  14. Functional Multiple-Set Canonical Correlation Analysis

    Hwang, Heungsun; Jung, Kwanghee; Takane, Yoshio; Woodward, Todd S.

    2012-01-01

    We propose functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions. The proposed method includes functional canonical correlation analysis as a special case when only two sets of functions are considered. As in classical multiple-set canonical correlation analysis, computationally, the…

  15. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

    Yuan Zheng

    2005-10-01

    Full Text Available Abstract Background Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of Cβ atoms in other residues within a sphere around the Cβ atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles, we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.

  16. Survival analysis of cervical cancer using stratified Cox regression

    Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.

    2016-04-01

    Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.

  17. Factors associated with methadone treatment duration: a Cox regression analysis.

    Chao-Kuang Lin

    Full Text Available This study examined retention rates and associated predictors of methadone maintenance treatment (MMT duration among 128 newly admitted patients in Taiwan. A semi-structured questionnaire was used to obtain demographic and drug use history. Daily records of methadone taken and test results for HIV, HCV, and morphine toxicology were taken from a computerized medical registry. Cox regression analyses were performed to examine factors associated with MMT duration. MMT retention rates were 80.5%, 68.8%, 53.9%, and 41.4% for 3, 6, 12, and 18 months, respectively. Excluding 38 patients incarcerated during the study period, retention rates were 81.1%, 73.3%, 61.1%, and 48.9% for 3 months, 6 months, 12 months, and 18 months, respectively. No participant seroconverted to HIV and 1 died during the 18-months follow-up. Results showed that being female, imprisonment, a longer distance from house to clinic, having a lower methadone dose after 30 days, being HCV positive, and in the New Taipei city program predicted early patient dropout. The findings suggest favorable MMT outcomes of HIV seroincidence and mortality. Results indicate that the need to minimize travel distance and to provide programs that meet women's requirements justify expansion of MMT clinics in Taiwan.

  18. A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data

    Using about 138,000 measurements of surface pCO2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO2 is higher than estimates based on available syntheses of surface pCO2. This supports earlier suggestions that the sink of CO2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 μatm. The subpolar gyre is a net sink of CO2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof

  19. A new approach to nuclear reactor design optimization using genetic algorithms and regression analysis

    desired power peaking limits, desired effective and infinite neutron multiplication factors, high fast fission factor, high thermal efficiency in the conversion from thermal energy to electrical energy using the Brayton cycle, and high fuel burn-up. It is to be noted that we have kept the total mass of the fuel as constant. In this work, we present a module based (modular) approach to perform the optimization wherein, we have defined the following modules: single fuel pin cell, whole core, thermal–hydraulics, and energy conversion. In each of the modules we have defined a specific set of parameters and optimization objectives. The GA system (GAS), and RS together, play the role of optimizing each of the individual modules, and integrating the modules to determine the final nuclear reactor core. However, implementation of GA could lead to a local minimum or a non-unique set of parameters, those meet the specific optimization objectives. The GA code is built using Java, neutronic analysis using MCNP6, thermal–hydraulics calculations using Java, and regression analysis using R

  20. The application of a multiple regression model for aero radiometric data

    The data observed in the total channel of high sensitivity airborne γ-ray spectrometric surveys is selected as the dependent variable while those of the Th, K and U channels are considered as independent variables and a linear statistical model is assumed to relate them as (Total)sub(i) αsub(0) + β1(U)sub(i) + β2(Th)sub(i) + β3(K)sub(i) + εsub(i), β1, β2, β3, are the partial regression coefficients and εsub(i) is the error term. The estimated coefficients (β1, β2, β3) are used to check on board the data acquisition system as well as to predict occasionally the more appropriate value of the data in case a single data item is not recorded correctly. (author)

  1. Using Multiple Regression in Estimating (semi) VOC Emissions and Concentrations at the European Scale

    Fauser, Patrik; Thomsen, Marianne; Pistocchi, Alberto;

    2010-01-01

    chemicals available in the European Chemicals Bureau risk assessment reports (RARs). The method suggests a simple linear relationship between Henry's Law constant, octanol-water coefficient, use and production volumes, and emissions and PECs on a regional scale in the European Union. Emissions and PECs are...... a result of a complex interaction between chemical properties, production and use patterns and geographical characteristics. A linear relationship cannot capture these complexities; however, it may be applied at a cost-efficient screening level for suggesting critical chemicals that are candidates...... for an in-depth risk assessment. Uncertainty measures are not available for the RAR data; however, uncertainties for the applied regression models are given in the paper. Evaluation of the methods reveals that between 79% and 93% of all emission and PEC estimates are within one order of magnitude of...

  2. Contextual Atlas Regression Forests: Multiple-Atlas-Based Automated Dose Prediction in Radiation Therapy.

    McIntosh, Chris; Purdie, Thomas G

    2016-04-01

    Radiation therapy is an integral part of cancer treatment, but to date it remains highly manual. Plans are created through optimization of dose volume objectives that specify intent to minimize, maximize, or achieve a prescribed dose level to clinical targets and organs. Optimization is NP-hard, requiring highly iterative and manual initialization procedures. We present a proof-of-concept for a method to automatically infer the radiation dose directly from the patient's treatment planning image based on a database of previous patients with corresponding clinical treatment plans. Our method uses regression forests augmented with density estimation over the most informative features to learn an automatic atlas-selection metric that is tailored to dose prediction. We validate our approach on 276 patients from 3 clinical treatment plan sites (whole breast, breast cavity, and prostate), with an overall dose prediction accuracies of 78.68%, 64.76%, 86.83% under the Gamma metric. PMID:26660888

  3. Multivariate Logistic Regression Analysis of the Risk Factors of Old People′s Subsequent Multiple Organ Dysfunction Syndrome after Cerebral Hemorrhage%老年脑出血继发多器官功能障碍综合征的危险因素分析

    鲍治诚; 夏雪龙; 王万华; 吴亚平

    2014-01-01

    目的:探讨老年人脑出血并发多器官功能障碍综合征( MODS)的危险因素。方法回顾性分析2012年2月至2014年2月昆山市第一人民医院神经内科收治的80例(其中并发 MODS 54例)老年脑出血患者的临床资料。对脑出血并发 MODS的危险因素进行单因素和多因素 Logistic回归分析。结果单因素分析结果显示,患者年龄、原有基础疾病、脏器衰竭数目、营养状态、机体免疫功能和感染与脑出血并发MODS有关(P<0.05);多因素Logistic回归分析结果显示,原有基础疾病、脏器衰竭数目和感染是脑出血并发MODS的独立危险因素(P<0.05)。结论老年人脑出血继发MODS的主要危险因素是原有基础疾病、脏器衰竭数目和感染,在临床工作中应积极重视这些危险因素并给予有效治疗,从而有效预防MODS的发生。%Objective To investigate the risk factors of cerebral hemorrhage with multiple organ dys-function syndrome(MODS) in the elderly.Methods Total of 80 cases of cerebral hemorrhage (54 cases with MODS) of the elderly in Neurological department of the First People′s Hospital of Kunshan from Feb. 2012 to Feb.2014 were selected,and their clinical information were retrospectively analyzed.The risk factors of the cerebral hemorrhage with MODS were analyzed by single factors and multivariate Logistic regression a-nalysis.Results Single factor analysis showed that age of patients,primary disease,the number of organ fail-ure,nutritional status,immune function,and infection were associated with cerebral hemorrhage with MODS (P<0.05).Multivariate Logistic regression analysis showed that primary disease,the number of organ fail-ure,infection were the independent risk factors of cerebral hemorrhage with MODS(P<0.05).Conclusion The risk factors of cerebral hemorrhage with MODS in the elderly include primary disease ,the number of organ failure and infection.More attention should be given

  4. Prediction of Rotor Spun Yarn Strength Using Adaptive Neuro-fuzzy Inference System and Linear Multiple Regression Methods

    NURWAHA Deogratias; WANG Xin-hou

    2008-01-01

    This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.

  5. A Comparison Of Artificial Neural Networks And Multiple Linear Regression Models As Predictors Of Discard Rates In Plastic Injection Molding

    Arikan Kargi , Vesile Sinem

    2015-01-01

    In today’s global competitive environment, it is important to be able to evaluate the efficient use of a firms’ resources. The aim of this study is to predict the discard rate for headlight frames before the project of an automotive sub-industry firm in Bursa. For this prediction, the multilayer perceptron model, the radial basis function network model and multiple linear regression models were used. Matlab R2010b software was used for the multilayer perceptron model and radial basis function...

  6. A comparative study between the use of artificial neural networks and multiple linear regression for caustic concentration prediction in a stage of alumina production

    Giovanni Leopoldo Rozza

    2015-09-01

    Full Text Available With world becoming each day a global village, enterprises continuously seek to optimize their internal processes to hold or improve their competitiveness and make better use of natural resources. In this context, decision support tools are an underlying requirement. Such tools are helpful on predicting operational issues, avoiding cost risings, loss of productivity, work-related accident leaves or environmental disasters. This paper has its focus on the prediction of spent liquor caustic concentration of Bayer process for alumina production. Caustic concentration measuring is essential to keep it at expected levels, otherwise quality issues might arise. The organization requests caustic concentration by chemical analysis laboratory once a day, such information is not enough to issue preventive actions to handle process inefficiencies that will be known only after new measurement on the next day. Thereby, this paper proposes using Multiple Linear Regression and Artificial Neural Networks techniques a mathematical model to predict the spent liquor´s caustic concentration. Hence preventive actions will occur in real time. Such models were built using software tool for numerical computation (MATLAB and a statistical analysis software package (SPSS. The models output (predicted caustic concentration were compared with the real lab data. We found evidence suggesting superior results with use of Artificial Neural Networks over Multiple Linear Regression model. The results demonstrate that replacing laboratorial analysis by the forecasting model to support technical staff on decision making could be feasible.

  7. Analysis of Filariasis Through Zero Inflated Poisson (ZIP) Regression Approach

    Mohammad Setyo Pramono; Herti Maryani; Sri Pingit Wulandari

    2014-01-01

    Background: Indonesia is a tropical disease endemic areas, one of which is the disease elephantiasis (filariasis). Filariasis is filarial worm infectionand transmitted by mosquito bites. Baseline Health Survey (Riskesdas) 2007 showed that the percentage of patients with filariasis in the province of Nanggroe Aceh Darussalam (NAD) was the largest in Indonesia. Methods: Secondary data analysis from Riskesdas 2007. The unit of analysis is the individual in NAD Province. Research focused on the r...

  8. Analyzing Multiple Outcomes: Is it Really Worth the use of Multivariate Linear Regression?

    Oliveira, Rosa; Teixeira-Pinto, Armando

    2015-01-01

    In health related research it is common to have multiple outcomes of interest in a single study. These outcomes are often analysed separately, ignoring the correlation between them. One would expect that a multivariate approach would be a more efficient alternative to individual analyses of each outcome. Surprisingly, this is not always the case. In this article we discuss different settings of linear models and compare the multivariate and univariate approaches. We show that for linear regre...

  9. Development of a User Interface for a Regression Analysis Software Tool

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.

  10. Multivariate Regression Analysis of Prognostic Factors in Colorectal Cancer

    YANGZuli; WANGJianping; WANGLei; DONGWenguang; HUANGYihua; QINJianzhang; ZHANWenhua

    2003-01-01

    Objective: To evaluate the relationship between clinicopathologic features and prognosis of col-orectal cancer after surgical treatment. Methods: The relationship between clinicopathological character-istics and prognosis of 941 patients with colorectal cancer after surgical treatment were investigated by univariate and multivariate analysis. Results: The overall 3- and 5-year survival rates of patients withcolorectal cancer after surgical treatment were 63.2% and 60.8% respectively with a median survival of 1841 days. Univariate analysis revealed that such factors as gross findings, degree of differentiation, depth of infiltration, nodal and distant metastasis and neoplastic intestinal obstruction were correlated with the survival rate. Dukes stages, gross tumor configuration, intramural spread and differentiation degree were shown to be available independent prognostic factors by multivariate analysis. Conclusion: Dukes stage,as the most important available independent prognostic factor for colorectal cancer (P<0.0005), can be used to assess the postoperative survival.

  11. A Noncentral "t" Regression Model for Meta-Analysis

    Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi

    2010-01-01

    In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…

  12. Performance and robustness of probabilistic river forecasts computed with quantile regression based on multiple independent variables

    Hoss, F.; Fischbeck, P. S.

    2015-09-01

    This study applies quantile regression (QR) to predict exceedance probabilities of various water levels, including flood stages, with combinations of deterministic forecasts, past forecast errors and rates of water level rise as independent variables. A computationally cheap technique to estimate forecast uncertainty is valuable, because many national flood forecasting services, such as the National Weather Service (NWS), only publish deterministic single-valued forecasts. The study uses data from the 82 river gauges, for which the NWS' North Central River Forecast Center issues forecasts daily. Archived forecasts for lead times of up to 6 days from 2001 to 2013 were analyzed. Besides the forecast itself, this study uses the rate of rise of the river stage in the last 24 and 48 h and the forecast error 24 and 48 h ago as predictors in QR configurations. When compared to just using the forecast as an independent variable, adding the latter four predictors significantly improved the forecasts, as measured by the Brier skill score and the continuous ranked probability score. Mainly, the resolution increases, as the forecast-only QR configuration already delivered high reliability. Combining the forecast with the other four predictors results in a much less favorable performance. Lastly, the forecast performance does not strongly depend on the size of the training data set but on the year, the river gauge, lead time and event threshold that are being forecast. We find that each event threshold requires a separate configuration or at least calibration.

  13. Development of regression model for uncertainty analysis by response surface method in HANARO

    The feasibility of uncertainty analysis with regression model in reactor physics problem was investigated. Regression model as a alternative model for a MCNP/ORIGEN2 code system which is uncertainty analysis tool of fission-produced molybdenum production was developed using Response Surface Method. It was shown that the development of regression model in the reactor physics problem was possible by introducing the burnup parameter. The most important parameter affecting the uncertainty of 99Mo yield ratio was fuel thickness in the regression model. This results agree well those of Crude Monte Carlo Method for each parameter. The regression model developed in this research was shown to be suitable as a alternative model, because coefficient of determination was 0.99

  14. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Hejun KANG; Shelley M.ALEXANDER

    2009-01-01

    We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.

  15. Sequential Processing and the Matching-Stimulus Interval Effect in ERP Components: An Exploration of the Mechanism Using Multiple Regression

    Steiner, Genevieve Z.; Barry, Robert J.; Gonsalvez, Craig J.

    2016-01-01

    In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies. PMID:27445774

  16. A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM

    Ijima, Yusuke; Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao

    In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.

  17. Parametric based thermo-environmental and exergoeconomic analyses of a combined cycle power plant with regression analysis and optimization

    Highlights: • Thermo-environmental and exergoeconomic models of a combined cycle power plant are defined. • Effects of various operating parameters on performance, CO2 emissions and costs are deliberated. • Multiple polynomial regression models are developed. • For various operating conditions, optimal operating parameters are determined. - Abstract: A combined cycle power plant is analyzed through thermo-environmental, exergoeconomic and statistical methods. The plant is first modeled and parametrically studied to deliberate the effects of various operating parameters on the thermo-environmental quantities, like net power output, energy efficiency, exergy efficiency and CO2 emissions. These quantities are then correlated with operating parameters through multiple polynomial regression analysis. Moreover, exergoeconomic analysis is performed to look into the impact of operating parameters on fuel cost, capital cost and exergy destruction cost. The optimal operating parameters are then determined using the Nelder-Mead simplex method by defining two objective functions, namely exergy efficiency (maximized) and total cost (minimized). According to the parametric analysis, the operating parameters impart significant effects on the performance and cost rates. The regression models are appearing to be a good estimator of the response variables since appended with satisfactory R2 values. The optimization results exhibit that the exergy efficiency is increased and cost rates are decreased by selecting the best trade-off values at different power output conditions

  18. Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

    Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...

  19. Understanding the causes of past nuclear plant availability performance through a multiple regression technique

    An analysis of past nuclear power plant availability performance is presented which covers the experience of 72 U.S. BWR's and PWR's currently in operation. This analysis quantitatively related availability to several design and organizational characteristics, including: plant size, age, staffing levels, maintenance quality, turnover rates, and other factors. The results are presented in terms of Physical (design), organizational, and external factors affecting plant performance

  20. A review of the most relevant multiple regression models for sales forecasting in gas stations; Uma revisao dos principais modelos de regressao multipla para previsao de vendas de postos de combustiveis

    Wanke, Peter [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Instituto de Pesquisa e Pos-Graduacao em Administracao de Empresas (COPPEAD). Centro de Estudos em Logistica

    2004-07-01

    In this paper, the most relevant multiple regression models for sales forecasting of gas stations, developed over the past ten years, are reviewed. The most significant variables related to gas station sales, the types of the multiple regression models (linear or non-linear), the most common uses in supporting decision making and its limits are presented. The predictive power of each model and its impact on decision-making, such as sensitivity analysis and confidence intervals for independent variables, are also commented. Four models are presented, based on studies conducted in South Africa, Portugal and Brazil. In conclusion, suggestions for future developments are presented based on past developments. (author)

  1. Response Rate and Teaching Effectiveness in Institutional Student Evaluation of Teaching: A Multiple Linear Regression Study

    Al-Maamari, Faisal

    2015-01-01

    It is important to consider the question of whether teacher-, course-, and student-related factors affect student ratings of instructors in Student Evaluation of Teaching (SET) in English Language Teaching (ELT). This paper reports on a statistical analysis of SET in two large EFL programmes at a university setting in the Sultanate of Oman. I…

  2. Identification of Determinants of Sports Skill Level in Badminton Players Using the Multiple Regression Model

    Jaworski Janusz

    2016-03-01

    Full Text Available Purpose. The aim of the study was to evaluate somatic and functional determinants of sports skill level in badminton players at three consecutive stages of training. Methods. The study examined 96 badminton players aged 11 to 19 years. The scope of the study included somatic characteristics, physical abilities and neurosensory abilities. Thirty nine variables were analysed in each athlete. Coefficients of multiple determination were used to evaluate the effect of structural and functional parameters on sports skill level in badminton players. Results. In the group of younger cadets, quality and effectiveness of playing were mostly determined by the level of physical abilities. In the group of cadets, the most important determinants were physical abilities, followed by somatic characteristics. In this group, coordination abilities were also important. In juniors, the most pronounced was a set of the variables that reflect physical abilities. Conclusions. Models of determination of sports skill level are most noticeable in the group of cadets. In all three groups of badminton players, the dominant effect on the quality of playing is due to a set of the variables that determine physical abilities.

  3. Regression analysis for a bottom-up approach to analyzing semi-prompt fission gamma yields

    Highlights: ► Fitting the semi-prompt non-resolved photon spectrum after fission. ► Energy–time dependence can be factorized. ► Physical model, statistical model, sampling procedure. ► The best fit is: lognormal for energy and F for time. - Abstract: We present an empirical model that describes the yield of gamma rays emitted by fission in the time interval from 20 to 958 ns following a fission event. The analysis is based on experimental data from neutron-induced fission of 235U and 239Pu. The model is devised by first using regression analysis to identify likely patterns in the data and to choose plausible fitting functions. We provide statistical and physical arguments in support of time and energy independence. The intensity of the emitted gamma rays can be described as a bivariate distribution that is the product of independent variates for energy and time. We test several plausible distribution families for the energy and time variates and use maximum likelihood and minimum χ2 to estimate distribution parameters. Because of the uncertainty in the experimental data, multiple combinations of variate pairs give rise to a surface that plausibly well fits the observations well. The best-fit variate turns out to be lognormal in energy and F in time. The findings illustrated in this paper can be used to simulate gamma ray de-excitation from fission in Monte Carlo codes.

  4. Regression analysis in modeling of air surface temperature and factors affecting its value in Peninsular Malaysia

    Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin

    2012-10-01

    This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.

  5. Analysis for Regression Model Behavior by Sampling Strategy for Annual Pollutant Load Estimation.

    Park, Youn Shik; Engel, Bernie A

    2015-11-01

    Water quality data are typically collected less frequently than streamflow data due to the cost of collection and analysis, and therefore water quality data may need to be estimated for additional days. Regression models are applicable to interpolate water quality data associated with streamflow data and have come to be extensively used, requiring relatively small amounts of data. There is a need to evaluate how well the regression models represent pollutant loads from intermittent water quality data sets. Both the specific regression model and water quality data frequency are important factors in pollutant load estimation. In this study, nine regression models from the Load Estimator (LOADEST) and one regression model from the Web-based Load Interpolation Tool (LOADIN) were evaluated with subsampled water quality data sets from daily measured water quality data sets for N, P, and sediment. Each water quality parameter had different correlations with streamflow, and the subsampled water quality data sets had various proportions of storm samples. The behaviors of the regression models differed not only by water quality parameter but also by proportion of storm samples. The regression models from LOADEST provided accurate and precise annual sediment and P load estimates using the water quality data of 20 to 40% storm samples. LOADIN provided more accurate and precise annual N load estimates than LOADEST. In addition, the results indicate that avoidance of water quality data extrapolation and availability of water quality data from storm events were crucial in annual pollutant load estimation using pollutant regression models. PMID:26641336

  6. Comparative Analysis of MOGA, NSGA-II and MOPSO for Regression Test Suite Optimization

    Zeeshan Anwar

    2014-01-01

    Full Text Available In Software Engineering Regression Testing is a mandatory activity. Whenever, a change in existing system occurs and new version appears, the unchanged portions need to be regression tested for any resulting undesirable effects. During process of Regression Testing, same test cases are executed repeatedly for un-modified portion of software. This activity is an overhead and consumes huge resources and budget. To save time and resources, researches have proposed various techniques for Regression Test Suite Optimization. In this research regression test suites are minimized using three Computational Intelligence multi-objective techniques for black box testing methods. These include; 1- Multi-Objective Genetic Algorithms (MOGA, 2- Non-Dominated Sorting Genetic Algorithm (NSGA-II and 3- Multi-Objective Particle Swarm Optimization (MOPSO. Said techniques are applied on two published case studies and through experimentation, the quality of these techniques is analyzed. Four quality metrics are defined to perform this analysis. The results of research show that MOGA is better for reducing the size and thus execution time of the regression test suites as compared to MOPSO and NSGA-II. It was also found that use of MOGA, NSGA-II and MOPSO are not safe for regression test suite optimization. This is because fault detection rate and requirement coverage is reduced after optimization of Regression Test Suites.

  7. Ballistic limit regression analysis for Space Station Freedom meteoroid and space debris protection system

    Jolly, William H.

    1992-01-01

    Relationships defining the ballistic limit of Space Station Freedom's (SSF) dual wall protection systems have been determined. These functions were regressed from empirical data found in Marshall Space Flight Center's (MSFC) Hypervelocity Impact Testing Summary (HITS) for the velocity range between three and seven kilometers per second. A stepwise linear least squares regression was used to determine the coefficients of several expressions that define a ballistic limit surface. Using statistical significance indicators and graphical comparisons to other limit curves, a final set of expressions is recommended for potential use in Probability of No Critical Flaw (PNCF) calculations for Space Station. The three equations listed below represent the mean curves for normal, 45 degree, and 65 degree obliquity ballistic limits, respectively, for a dual wall protection system consisting of a thin 6061-T6 aluminum bumper spaced 4.0 inches from a .125 inches thick 2219-T87 rear wall with multiple layer thermal insulation installed between the two walls. Normal obliquity is d(sub c) = 1.0514 v(exp 0.2983 t(sub 1)(exp 0.5228). Forty-five degree obliquity is d(sub c) = 0.8591 v(exp 0.0428) t(sub 1)(exp 0.2063). Sixty-five degree obliquity is d(sub c) = 0.2824 v(exp 0.1986) t(sub 1)(exp -0.3874). Plots of these curves are provided. A sensitivity study on the effects of using these new equations in the probability of no critical flaw analysis indicated a negligible increase in the performance of the dual wall protection system for SSF over the current baseline. The magnitude of the increase was 0.17 percent over 25 years on the MB-7 configuration run with the Bumper II program code.

  8. Multiple linear analysis methods for the quantification of irreversibly binding radiotracers

    Kim, Su Jin; Lee, Jae Sung; Kim, Yu Kyeong; Frost, James; Wand, Gary; McCaul, Mary E.; Lee, Dong Soo

    2008-01-01

    Gjedde–Patlak graphical analysis (GPGA) has commonly been used to quantify the net accumulations (Kin) of radioligands that bind or are taken up irreversibly. We suggest an alternative approach (MLAIR: multiple linear analysis for irreversible radiotracers) for the quantification of these types of tracers. Two multiple linear regression model equations were derived from differential equations of the two-tissue compartment model with irreversible binding. Multiple linear analysis for irreversi...

  9. Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

    Deng, Yangyang; Parajuli, Prem B.

    2011-08-10

    Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.

  10. Regression analysis understanding and building business and economic models using Excel

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  11. Regression and local control rates after radiotherapy for jugulotympanic paragangliomas: Systematic review and meta-analysis

    The primary treatment goal of radiotherapy for paragangliomas of the head and neck region (HNPGLs) is local control of the tumor, i.e. stabilization of tumor volume. Interestingly, regression of tumor volume has also been reported. Up to the present, no meta-analysis has been performed giving an overview of regression rates after radiotherapy in HNPGLs. The main objective was to perform a systematic review and meta-analysis to assess regression of tumor volume in HNPGL-patients after radiotherapy. A second outcome was local tumor control. Design of the study is systematic review and meta-analysis. PubMed, EMBASE, Web of Science, COCHRANE and Academic Search Premier and references of key articles were searched in March 2012 to identify potentially relevant studies. Considering the indolent course of HNPGLs, only studies with ⩾12 months follow-up were eligible. Main outcomes were the pooled proportions of regression and local control after radiotherapy as initial, combined (i.e. directly post-operatively or post-embolization) or salvage treatment (i.e. after initial treatment has failed) for HNPGLs. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Fifteen studies were included, concerning a total of 283 jugulotympanic HNPGLs in 276 patients. Pooled regression proportions for initial, combined and salvage treatment were respectively 21%, 33% and 52% in radiosurgery studies and 4%, 0% and 64% in external beam radiotherapy studies. Pooled local control proportions for radiotherapy as initial, combined and salvage treatment ranged from 79% to 100%. Radiotherapy for jugulotympanic paragangliomas results in excellent local tumor control and therefore is a valuable treatment for these types of tumors. The effects of radiotherapy on regression of tumor volume remain ambiguous, although the data suggest that regression can

  12. A PANEL REGRESSION ANALYSIS OF HUMAN CAPITAL RELEVANCE IN SELECTED SCANDINAVIAN AND SE EUROPEAN COUNTRIES

    Filip Kokotovic

    2016-01-01

    The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. ...

  13. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    Ladislav Kristoufek

    2014-01-01

    We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential non-stationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard...

  14. Simultaneous optimization of nanocrystalline SnO2 thin film deposition using multiple linear regressions.

    Ebrahimiasl, Saeideh; Zakaria, Azmi

    2014-01-01

    A nanocrystalline SnO2 thin film was synthesized by a chemical bath method. The parameters affecting the energy band gap and surface morphology of the deposited SnO2 thin film were optimized using a semi-empirical method. Four parameters, including deposition time, pH, bath temperature and tin chloride (SnCl2·2H2O) concentration were optimized by a factorial method. The factorial used a Taguchi OA (TOA) design method to estimate certain interactions and obtain the actual responses. Statistical evidences in analysis of variance including high F-value (4,112.2 and 20.27), very low P-value (<0.012 and 0.0478), non-significant lack of fit, the determination coefficient (R2 equal to 0.978 and 0.977) and the adequate precision (170.96 and 12.57) validated the suggested model. The optima of the suggested model were verified in the laboratory and results were quite close to the predicted values, indicating that the model successfully simulated the optimum conditions of SnO2 thin film synthesis. PMID:24509767

  15. Simultaneous Optimization of Nanocrystalline SnO2 Thin Film Deposition Using Multiple Linear Regressions

    Saeideh Ebrahimiasl

    2014-02-01

    Full Text Available A nanocrystalline SnO2 thin film was synthesized by a chemical bath method. The parameters affecting the energy band gap and surface morphology of the deposited SnO2 thin film were optimized using a semi-empirical method. Four parameters, including deposition time, pH, bath temperature and tin chloride (SnCl2·2H2O concentration were optimized by a factorial method. The factorial used a Taguchi OA (TOA design method to estimate certain interactions and obtain the actual responses. Statistical evidences in analysis of variance including high F-value (4,112.2 and 20.27, very low P-value (<0.012 and 0.0478, non-significant lack of fit, the determination coefficient (R2 equal to 0.978 and 0.977 and the adequate precision (170.96 and 12.57 validated the suggested model. The optima of the suggested model were verified in the laboratory and results were quite close to the predicted values, indicating that the model successfully simulated the optimum conditions of SnO2 thin film synthesis.

  16. Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model

    Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam

    2013-04-01

    In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (June–September) rainfall were identified from the large scale ocean–atmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 1961–2007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 1977–2007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was −0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.

  17. Joint Analysis of Multiple Traits in Rare Variant Association Studies.

    Wang, Zhenchuan; Wang, Xuexia; Sha, Qiuying; Zhang, Shuanglin

    2016-05-01

    The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, the majority of existing methods for the joint analysis of multiple traits test association between one common variant and multiple traits. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. Current statistical methods for rare variant association studies are for one single trait only. In this paper, we propose an adaptive weighting reverse regression (AWRR) method to test association between multiple traits and rare variants in a genomic region. AWRR is robust to the directions of effects of causal variants and is also robust to the directions of association of traits. Using extensive simulation studies, we compare the performance of AWRR with canonical correlation analysis (CCA), Single-TOW, and the weighted sum reverse regression (WSRR). Our results show that, in all of the simulation scenarios, AWRR is consistently more powerful than CCA. In most scenarios, AWRR is more powerful than Single-TOW and WSRR. PMID:26990300

  18. The Effect of Ignoring Statistical Interactions in Regression Analyses Conducted in Epidemiologic Studies: An Example with Survival Analysis Using Cox Proportional Hazards Regression Model

    Vatcheva, KP; Lee, M; McCormick, JB; Rahbar, MH

    2016-01-01

    Objective To demonstrate the adverse impact of ignoring statistical interactions in regression models used in epidemiologic studies. Study design and setting Based on different scenarios that involved known values for coefficient of the interaction term in Cox regression models we generated 1000 samples of size 600 each. The simulated samples and a real life data set from the Cameron County Hispanic Cohort were used to evaluate the effect of ignoring statistical interactions in these models. Results Compared to correctly specified Cox regression models with interaction terms, misspecified models without interaction terms resulted in up to 8.95 fold bias in estimated regression coefficients. Whereas when data were generated from a perfect additive Cox proportional hazards regression model the inclusion of the interaction between the two covariates resulted in only 2% estimated bias in main effect regression coefficients estimates, but did not alter the main findings of no significant interactions. Conclusions When the effects are synergic, the failure to account for an interaction effect could lead to bias and misinterpretation of the results, and in some instances to incorrect policy decisions. Best practices in regression analysis must include identification of interactions, including for analysis of data from epidemiologic studies.

  19. Multivariate regression analysis applied to the calibration of equipment used in pig meat classification in Romania.

    Savescu, Roxana Florenta; Laba, Marian

    2016-06-01

    This paper highlights the statistical methodology used in a dissection experiment carried out in Romania to calibrate and standardize two classification devices, OptiGrade PRO (OGP) and Fat-o-Meat'er (FOM). One hundred forty-five carcasses were measured using the two probes and dissected according to the European reference method. To derive prediction formulas for each device, multiple linear regression analysis was performed on the relationship between the reference lean meat percentage and the back fat and muscle thicknesses, using the ordinary least squares technique. The root mean squared error of prediction calculated using the leave-one-out cross validation met European Commission (EC) requirements. The application of the new prediction equations reduced the gap between the lean meat percentage measured with the OGP and FOM from 2.43% (average for the period Q3/2006-Q2/2008) to 0.10% (average for the period Q3/2008-Q4/2014), providing the basis for a fair payment system for the pig producers. PMID:26835835

  20. Multiple factor analysis by example using R

    Pagès, Jérôme

    2014-01-01

    Multiple factor analysis (MFA) enables users to analyze tables of individuals and variables in which the variables are structured into quantitative, qualitative, or mixed groups. Written by the co-developer of this methodology, Multiple Factor Analysis by Example Using R brings together the theoretical and methodological aspects of MFA. It also includes examples of applications and details of how to implement MFA using an R package (FactoMineR).The first two chapters cover the basic factorial analysis methods of principal component analysis (PCA) and multiple correspondence analysis (MCA). The

  1. Thermo-environmental and economic analysis of simple and regenerative gas turbine cycles with regression modeling and optimization

    Highlights: • Thermodynamic models of simple and regenerative cycles are defined. • Exergy destruction rate of different components was determined. • Impact of important operating parameters on cycles’ characteristics was determined. • Multiple polynomial regression models were developed. • Optimization for optimal operating parameters was performed. - Abstract: In this paper, thermo-environmental, economic and regression analyses of simple and regenerative gas turbine cycles are exhibited. Firstly, thermodynamic models for both cycles are defined; exergy destruction rate of different components is determined and parametric study is carried out to investigate the effects of compressor inlet temperature, turbine inlet temperature and compressor pressure ratio on the parameters that measure cycles’ performance, environmental impact and costs. Subsequently, multiple polynomial regression (MPR) models are developed to correlate important response variables with predictor variables and finally optimization is performed for optimal operating conditions. The results of parametric study have shown a significant impact of operating parameters on the performance parameters, environmental impact and costs. According to exergy analysis, the combustion chamber and exhaust stack are two major sites where largest exergy destruction/losses occur. Also, the total exergy destruction in the regenerative cycle is relatively lower; thereby resulted in a higher exergy efficiency of the cycle. The MPR models are also appeared as good estimator of the response variables since appended with very high R2 values. Finally, these models are used to determine the optimal operating parameters, which maximize the cycles’ performance and minimize CO2 emissions and costs

  2. Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions

    Junzan Zhou

    2015-01-01

    Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.

  3. ANALYSIS OF RISING TUITION RATES IN THE UNITED STATES BASED ON CLUSTERING ANALYSIS AND REGRESSION MODELS

    Long Cheng

    2016-05-01

    Full Text Available Since higher education is one of the major driving forces for country development and social prosperity, and tuition plays a significant role in determining whether or not a person can afford to receive higher education, the rising tuition is a topic of big concern today. So it is essentially necessary to understand what factors affect the tuition and how they increase or decrease the tuition. Many existing studies on the rising tuition either lack large amounts of real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition, which fail to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering analysis and regression models.

  4. Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

    Yang, Jianhong, E-mail: yangjianhong@me.ustb.edu.cn [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Yi, Cancan; Xu, Jinwu [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Ma, Xianghong [School of Engineering and Applied Science, Aston University, Birmingham B4 7ET (United Kingdom)

    2015-05-01

    A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution.

  5. Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

    A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution

  6. Analysis and application of partial least square regression in arc welding process

    YANG Hai-lan; CAI Yan; BAO Ye-feng; ZHOU Yun

    2005-01-01

    Because of the relativity among the parameters, partial least square regression(PLSR)was applied to build the model and get the regression equation. The improved algorithm simplified the calculating process greatly because of the reduction of calculation. The orthogonal design was adopted in this experiment. Every sample had strong representation, which could reduce the experimental time and obtain the overall test data. Combined with the formation problem of gas metal arc weld with big current, the auxiliary analysis technique of PLSR was discussed and the regression equation of form factors (i.e. surface width, weld penetration and weld reinforcement) to process parameters(i.e. wire feed rate, wire extension, welding speed, gas flow, welding voltage and welding current)was given. The correlativity structure among variables was analyzed and there was certain correlation between independent variables matrix X and dependent variables matrix Y. The regression analysis shows that the welding speed mainly influences the weld formation while the variation of gas flow in certain range has little influence on formation of weld. The fitting plot of regression accuracy is given. The fitting quality of regression equation is basically satisfactory.

  7. Bayesian Analysis of Hazard Regression Models under Order Restrictions on Covariate Effects and Ageing

    Bhattacharjee, Arnab; Bhattacharjee, Madhuchhanda

    2007-01-01

    We propose Bayesian inference in hazard regression models where the baseline hazard is unknown, covariate effects are possibly age-varying (non-proportional), and there is multiplicative frailty with arbitrary distribution. Our framework incorporates a wide variety of order restrictions on covariate dependence and duration dependence (ageing). We propose estimation and evaluation of age-varying covariate effects when covariate dependence is monotone rather than proportional. In particular, we...

  8. An Analysis of Transit Bus Driver Distraction Using Multinomial Logistic Regression Models

    D'Souza, Kelwyn

    2012-01-01

    This paper explores the problem of distracted driving at a regional bus transit agency to identify the sources of distraction and provide an understanding of factors responsible for driver distraction. A risk range system was developed to classify the distracting activities into four risk zones. The high risk zone distracting activities were analyzed using multinomial logistic regression models to determine the impact of various factors on the multiple categorical levels of driver distraction...

  9. Econometric analysis of realized covariation: high frequency based covariance, regression, and correlation in financial economics

    Barndorff-Nielsen, Ole Eiler; Shephard, N.

    2004-01-01

    This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing the...... number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....

  10. Logistic Regression

    Grégoire, G.

    2014-12-01

    The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.

  11. Based on multiple regression method and grey forecasting mode’s Jianxi Fengcheng industrial park financial requirement prediction analysis%基于多元回归法和灰色预测模型的江西丰城工业园融资需求预测分析

    孙伟; 林芳琦

    2012-01-01

    为了研究丰城市工业园区2012~2016年工业资金需求水平,笔者运用柯布一道格拉斯生产函数,以2005~2011年的工业增加值、就业人数和固定资产投资额为基本数据,在SPSS中运用多元回归法建立预测模型;再通过GM2008年灰色预测系统,运用灰色预测模型对工业增加值和就业人数进行预测,并进一步预测2012~2016年的资金需求和提出具有可行性的政策建议。%In order to research the industrial capital requirement of Fengcheng industrial park from 2012 to 2016,the author use Cobb-Douglas Production Function,with industrial added value,employment and fixed asset investment from 2005 to 2011 as the basic data,using multiple regression in SPSS to establish the forecasting model.Then through the Grey Forecasting System 2008,we use the Grey Forecasting Model to forecast the industrial added value and employment,and further forecast the financial needs from 2012 to 2016 and feasibility put forward some policy suggestions.

  12. Land use regression modeling of intra-urban residential variability in multiple traffic-related air pollutants

    Baxter Lisa K

    2008-05-01

    Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns

  13. A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.

    Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung

    2016-03-01

    With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the

  14. Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis

    Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy

    2012-01-01

    The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…

  15. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    Czekaj, Tomasz Gerard

    practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric and...

  16. Catching up with Harvard: Results from Regression Analysis of World Universities League Tables

    Li, Mei; Shankar, Sriram; Tang, Kam Ki

    2011-01-01

    This paper uses regression analysis to test if the universities performing less well according to Shanghai Jiao Tong University's world universities league tables are able to catch up with the top performers, and to identify national and institutional factors that could affect this catching up process. We have constructed a dataset of 461…

  17. Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.

    Waugh, C. Keith

    This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…

  18. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

  19. A parallel implementation of the network identification by multiple regression (NIR algorithm to reverse-engineer regulatory gene networks.

    Francesco Gregoretti

    Full Text Available The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.

  20. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  1. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  2. A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks.

    Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

    2010-01-01

    The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications. PMID:20422008

  3. Fast algorithm of the robust Gaussian regression filter for areal surface analysis

    In this paper, the general model of the Gaussian regression filter for areal surface analysis is explored. The intrinsic relationships between the linear Gaussian filter and the robust filter are addressed. A general mathematical solution for this model is presented. Based on this technique, a fast algorithm is created. Both simulated and practical engineering data (stochastic and structured) have been used in the testing of the fast algorithm. Results show that with the same accuracy, the processing time of the second-order nonlinear regression filters for a dataset of 1024*1024 points has been reduced to several seconds from the several hours of traditional algorithms

  4. Analysis of designed experiments by stabilised PLS Regression and jack-knifing

    Martens, Harald; Høy, M.; Westad, F.;

    2001-01-01

    Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range of...... reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi...

  5. Regression analysis of non-contact acousto-thermal signature data

    Criner, Amanda; Schehl, Norman

    2016-05-01

    The non-contact acousto-thermal signature (NCATS) is a nondestructive evaluation technique with potential to detect fatigue in materials such as noisy titanium and polymer matrix composites. The underlying physical mechanisms and properties may be determined by parameter estimation via nonlinear regression. The nonlinear regression analysis formulation, including the underlying models, is discussed. Several models and associated data analyses are given along with the assumptions implicit in the underlying model. The results are anomalous. These anomalous results are evaluated with respect to the accuracy of the implicit assumptions.

  6. Methods and applications of linear models regression and the analysis of variance

    Hocking, Ronald R

    2013-01-01

    Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book

  7. Assessment of neural network, frequency ratio and regression models for landslide susceptibility analysis

    Pradhan, B.; Buchroithner, M. F.; Mansor, S.

    2009-04-01

    This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.

  8. Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

    Wen-Cheng Wang

    2014-01-01

    Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  9. Quantile Regression Analysis on Convergence of China’s Regional Economic Growth

    Kun; HE

    2014-01-01

    Using quantile regression method,this paper made an empirical analysis on convergence of China’s regional economic growth since the reform and opening-up.It firstly introduced principle of quantile regression method and related theories of convergence of economic growth.Through discussing interprovincial variation coefficient of GDP per capita,it carried out σ convergence analysis on economic growth and divided 3 decades since the reform and opening-up into 3 stages.Then,it made a comparative analysis of absolute β convergence on 3 stages using least-squares estimation and quantile regression method,and also stressed the advantage of quantile regression method.On this basis,it made an in-depth study on conditional β convergence at 3 stages.Empirical results indicate that there is absolute and conditional convergence at the first stage,no convergence at the second stage,and weak convergence at the third stage.Finally,it discussed weak points in this study and came up with recommendations for future studies.

  10. Informal Housing in Greece: A Multinomial Logistic Regression Analysis at the Regional Level

    Polyzos, Serafeim; MINETOS, Dionysios

    2014-01-01

    This paper deals with the primary causes of informal housing in Greece as well as the observed differentiations in informal housing patterns across space. The spatial level of analysis is the prefectural administrative level. The results of the multinomial logistic regression analysis indicate that Greek prefectures differ in the way they experience the informal housing phenomenon. An explanation for the observed differences may be the separate development paths followed and the ...

  11. Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis

    Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae

    The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.

  12. Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using additive quantile regression.

    Nora Fenske

    Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.

  13. Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.

    Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J

    2015-06-01

    Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed

  14. Linear regression analysis of the gamma dose in fast neutron beams

    The dual dosimeter technique for determining both the absorbed dose of neutrons and photons in a mixed field has been applied to multiple dosimeter use. The data were analyzed by a linear regression method which yields the neutron dose from the slope and the photon dose from the intercept and an estimation of the uncertainty of the photon dose can also be obtained. Measurements were made on a high energy neutron beam and the photon dose obtained both as a function of field size and depth in a tissue equivalent phantom

  15. Applying support vector regression analysis on grip force level-related corticomuscular coherence

    Rong, Yao; Han, Xixuan; Hao, Dongmei; Cao, Liu; Wang, Qing; Li, Mingai; Duan, Lijuan; Zeng, Yanjun

    2014-01-01

    accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced to...... compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion...

  16. Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations

    Belenkiy, Ari

    2008-01-01

    In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...

  17. Predicting Distribution and Inter-Annual Variability of Tropical Cyclone Intensity from a Stochastic, Multiple-Linear Regression Model

    Lee, C. Y.; Tippett, M. K.; Sobel, A. H.; Camargo, S. J.

    2014-12-01

    We are working towards the development of a new statistical-dynamical downscaling system to study the influence of climate on tropical cyclones (TCs). The first step is development of an appropriate model for TC intensity as a function of environmental variables. We approach this issue with a stochastic model consisting of a multiple linear regression model (MLR) for 12-hour intensity forecasts as a deterministic component, and a random error generator as a stochastic component. Similar to the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS), MLR relates the surrounding environment to storm intensity, but with only essential predictors calculated from monthly-mean NCEP reanalysis fields (potential intensity, shear, etc.) and from persistence. The deterministic MLR is developed with data from 1981-1999 and tested with data from 2000-2012 for the Atlantic, Eastern North Pacific, Western North Pacific, Indian Ocean, and Southern Hemisphere basins. While the global MLR's skill is comparable to that of the operational statistical models (e.g., SHIPS), the distribution of the predicted maximum intensity from deterministic results has a systematic low bias compared to observations; the deterministic MLR creates almost no storms with intensities greater than 100 kt. The deterministic MLR can be significantly improved by adding the stochastic component, based on the distribution of random forecasting errors from the deterministic model compared to the training data. This stochastic component may be thought of as representing the component of TC intensification that is not linearly related to the environmental variables. We find that in order for the stochastic model to accurately capture the observed distribution of maximum storm intensities, the stochastic component must be auto-correlated across 12-hour time steps. This presentation also includes a detailed discussion of the distributions of other TC-intensity related quantities, as well as the inter

  18. Partially linear censored quantile regression

    Neocleous, T.; Portnoy, S.

    2009-01-01

    Censored regression quantile (CRQ) methods provide a powerful and flexible approach to the analysis of censored survival data when standard linear models are felt to be appropriate. In many cases however, greater flexibility is desired to go beyond the usual multiple regression paradigm. One area of common interest is that of partially linear models: one (or more) of the explanatory covariates are assumed to act on the response through a non-linear function. Here the CRQ approach of Portnoy (...

  19. A deformation analysis method of stepwise regression for bridge deflection prediction

    Shen, Yueqian; Zeng, Ying; Zhu, Lei; Huang, Teng

    2015-12-01

    Large-scale bridges are among the most important infrastructures whose safe conditions concern people's daily activities and life safety. Monitoring of large-scale bridges is crucial since deformation might have occurred. How to obtain the deformation information and then judge the safe conditions are the key and difficult problems in bridge deformation monitoring field. Deflection is the important index for evaluation of bridge safety. This paper proposes a forecasting modeling of stepwise regression analysis. Based on the deflection monitoring data of Yangtze River Bridge, the main factors influenced deflection deformation is chiefly studied. Authors use the monitoring data to forecast the deformation value of a bridge deflection at different time from the perspective of non-bridge structure, and compared to the forecasting of gray relational analysis based on linear regression. The result show that the accuracy and reliability of stepwise regression analysis is high, which provides the scientific basis to the bridge operation management. And above all, the ideas of this research provide and effective method for bridge deformation analysis.

  20. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  1. Regressão múltipla stepwise e hierárquica em Psicologia Organizacional: aplicações, problemas e soluções Stepwise and hierarchical multiple regression in organizational psychology: Applications, problemas and solutions

    Gardênia Abbad

    2002-01-01

    Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.

  2. Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions

    Catalin Angelo Ioan

    2011-08-01

    Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.

  3. Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa

    Otoo

    2015-08-01

    Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.

  4. Forecasting municipal solid waste generation using prognostic tools and regression analysis.

    Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria

    2016-11-01

    For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction. PMID:27454099

  5. Regression analysis of growth responses to water depth in three wetland plant species

    Sorrell, Brian K; Tanner, Chris C; Brix, Hans

    2012-01-01

    ) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water...... depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth...... differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis was...

  6. Robust estimation for homoscedastic regression in the secondary analysis of case-control data

    Wei, Jiawei

    2012-12-04

    Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.

  7. Knowledge and perception on tuberculosis transmission in Tanzania: Multinomial logistic regression analysis of secondary data

    Ismail, Abbas; Josephat, Peter

    2014-01-01

    Tuberculosis (TB) is one of the most important public health problems in Tanzania and was declared as a national public health emergency in 2006. Community and individual knowledge and perceptions are critical factors in the control of the disease. The objective of this study was to analyze the knowledge and perception on the transmission of TB in Tanzania. Multinomial Logistic Regression analysis was considered in order to quantify the impact of knowledge and perception on TB. The data used ...

  8. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    Krištoufek, Ladislav

    2015-01-01

    Roč. 91, č. 1 (2015), 022802-1-022802-5. ISSN 1539-3755 R&D Projects: GA ČR(CZ) GP14-11402P Grant ostatní: GA ČR(CZ) GAP402/11/0948 Institutional support: RVO:67985556 Keywords : Detrended cross-correlation analysis * Regression * Scales Subject RIV: AH - Economics Impact factor: 2.288, year: 2014 http://library.utia.cas.cz/separaty/2015/E/kristoufek-0452315.pdf

  9. Mixed-effects Poisson regression analysis of adverse event reports: The relationship between antidepressants and suicide

    Gibbons, Robert D.; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K.; Bhaumik, Dulal K.; Brown, C Hendricks; Kapur, Kush; Marcus, Sue M.; Hur, Kwan; Mann, J. John

    2008-01-01

    A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Adminis...

  10. Perceived service quality and determination of the effect on service preference with logistic regression analysis

    Mehmet AKSARAYLI; SAYGIN, Özge

    2011-01-01

    In this study, students' perceived service quality level of Dokuz Eylul University (DEU) Buca Girl Dormitory Service is investigated by using SERVQUAL scale, which is a common service quality measure. Impacts of the dimensions of perceived service quality, which are tangibles, reliability, responsiveness, assurance, empathy, on preference and recommendation are investigated by logistic regression analysis. As a result, it is concluded that perceived service quality has impacts on preference a...

  11. Re-examining covariance risk dynamics in international stock markets using quantile regression analysis

    M. Y. L. Li; S. M. F. Yen

    2011-01-01

    This investigation is one of the first to adopt quantile regression (QR) technique to examine covariance risk dynamics in international stock markets. Feasibility of the proposed model is demonstrated in G7 stock markets. Additionally, two conventional random-coefficient frameworks, including time-varying betas derived from GARCH models and state-varying betas implied by Markov-switching models, are employed and subjected to comparative analysis. The empirical findings of this work are consis...

  12. Variable selection and regression analysis for graph-structured covariates with an application to genomics

    Li, Caiyan; Li, Hongzhe

    2010-01-01

    Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein--protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. ...

  13. Augmented kludge waveforms and Gaussian process regression for EMRI data analysis

    Chua, Alvin J K

    2016-01-01

    Extreme-mass-ratio inspirals (EMRIs) will be an important type of astrophysical source for future space-based gravitational-wave detectors. There is a trade-off between accuracy and computational speed for the EMRI waveform templates required in the analysis of data from these detectors. We discuss how the systematic error incurred by using faster templates may be reduced with improved models such as augmented kludge waveforms, and marginalised over with statistical techniques such as Gaussian process regression.

  14. Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression

    Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.

    2015-12-01

    Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.

  15. Combinatorial protocol in multiple linear regression/partial least-squares directed rationale for the caspase-3 inhibition activity of isoquinoline-1,3,4-trione derivatives.

    Sharma, B K; Pilania, P; Singh, P; Prabhakar, Y S

    2010-01-01

    The caspase-3 inhibition activity of isoquinoline-1,3,4-trione derivatives has been analysed with the topological and molecular features from Dragon software. Analysis of the structural features in conjunction with the biological endpoints in combinatorial protocol in multiple linear regression (CP-MLR) led to the identification of 45 descriptors for modelling the activity. The study clearly suggested the role of rotatable bonds, mean information on the distance degree equality, radial centricity, bond and structural information content of five-order neighbourhood symmetry, atomic van der Waals volumes and the presence or absence of certain structural fragments to optimise the caspase-3 inhibitory activity of titled compounds. The models developed and the participating descriptors advocate that the substituent groups of the isoquinoline moiety hold scope for further modification in the optimization of the caspase-3 inhibitory activity. Analysis of these descriptors in partial least squares (PLS) highlighted their relative significance in modulating the biological response. The selected descriptors are enriched with information corresponding to the activity when compared to the remaining ones. PMID:20373219

  16. Genetic analysis of carcass traits in beef cattle using random regression models.

    Englishby, T M; Banos, G; Moore, K L; Coffey, M P; Evans, R D; Berry, D P

    2016-04-01

    Livestock mature at different rates depending, in part, on their genetic merit; therefore, the optimal age at slaughter for progeny of certain sires may differ. The objective of the present study was to examine sire-level genetic profiles for carcass weight, carcass conformation, and carcass fat in cattle of multiple beef and dairy breeds, including crossbreeds. Slaughter records from 126,214 heifers and 124,641 steers aged between 360 and 1,200 d and from 86,089 young bulls aged between 360 and 720 d were used in the analysis; animals were from 15,127 sires. Variance components for each trait across age at slaughter were generated using sire random regression models that included quadratic polynomials for fixed and random effects; heterogeneous residual variances were assumed across ages. Heritability estimates across genders ranged from 0.08 (±0.02) to 0.34 (±0.02) for carcass weight, from 0.24 (±0.02) to 0.42 (±0.01) for conformation, and from 0.16 (±0.03) to 0.40 (±0.02) for fat score. Genetic correlations within each trait across ages weakened as the interval between ages compared lengthened but were all >0.64, suggesting a similar genetic background for each trait across different ages. Eigenvalues and eigenfunctions of the additive genetic covariance matrix revealed genetic variability among animals in their growth profiles for carcass traits, although most of the genetic variability was associated with the height of the growth profile. At the same age, a positive genetic correlation (0.60 to 0.78; SE ranged from 0.01 to 0.04) existed between carcass weight and conformation, whereas negative genetic correlations existed between fatness and both conformation (-0.46 to 0.08; SE ranged from 0.02 to 0.09) and carcass weight (-0.48 to -0.16; SE ranged from 0.02 to 0.14) at the same age. The estimated genetic parameters in the present study indicate genetic variability in the growth trajectory in cattle, which can be exploited through breeding programs and

  17. A Bayesian Approach for Evaluation of Determinants of Health System Efficiency Using Stochastic Frontier Analysis and Beta Regression.

    Şenel, Talat; Cengiz, Mehmet Ali

    2016-01-01

    In today's world, Public expenditures on health are one of the most important issues for governments. These increased expenditures are putting pressure on public budgets. Therefore, health policy makers have focused on the performance of their health systems and many countries have introduced reforms to improve the performance of their health systems. This study investigates the most important determinants of healthcare efficiency for OECD countries using second stage approach for Bayesian Stochastic Frontier Analysis (BSFA). There are two steps in this study. First we measure 29 OECD countries' healthcare efficiency by BSFA using the data from the OECD Health Database. At second stage, we expose the multiple relationships between the healthcare efficiency and characteristics of healthcare systems across OECD countries using Bayesian beta regression. PMID:27118987

  18. Reduced Rank Regression

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...... eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. We briefly mention asymptotic results...

  19. Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis

    Carlos Augusto Zangrando Toneli

    2011-09-01

    Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.

  20. Identification of cotton properties to improve yarn count quality by using regression analysis

    Identification of raw material characteristics towards yarn count variation was studied by using statistical techniques. Regression analysis is used to meet the objective. Stepwise regression is used for mode) selection, and coefficient of determination and mean squared error (MSE) criteria are used to identify the contributing factors of cotton properties for yam count. Statistical assumptions of normality, autocorrelation and multicollinearity are evaluated by using probability plot, Durbin Watson test, variance inflation factor (VIF), and then model fitting is carried out. It is found that, invisible (INV), nepness (Nep), grayness (RD), cotton trash (TR) and uniformity index (VI) are the main contributing cotton properties for yarn count variation. The results are also verified by Pareto chart. (author)

  1. The Effects of Agricultural Informatization on Agricultural Economic Growth: An Empirical Analysis Based on Regression Model

    Lingling; TAN

    2013-01-01

    This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.

  2. Quantitative structure-property relationship study of n-octanol-water partition coefficients of some of diverse drugs using multiple linear regression

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log Po/w). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log Po/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log Po/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R2) for MLR model were 0.22 and 0.99 for the prediction set log Po/w

  3. Optimization of U–Th fuel in heavy water moderated thermal breeder reactors using multivariate regression analysis and genetic algorithms

    Highlights: • A new method useful for the parametric analysis and optimization of reactor core designs. • This uses the strengths of genetic algorithms (GA), and regression splines. • The method is applied to the core fuel pin cell of a PHWR design. • Tools like java, R, and codes like Serpent, Matlab are used in this research. - Abstract: An analysis and optimization of a set of neutronics parameters of a thorium-fueled pressurized heavy water reactor core fuel has been performed. The analysis covers a detailed pin-cell analysis of a seed-blanket configuration, where the seed is composed of natural uranium, and the blanket is composed of thorium. Genetic algorithms (GA) is used to optimize the input parameters to meet a specific set of objectives related to: infinite multiplication factor, initial breeding ratio, and specific nuclide’s effective microscopic cross-section. The core input parameters are the pitch-to-diameter ratio, and blanket material composition. Recursive partitioning of decision trees (rpart) multivariate regression model is used to perform a predictive analysis of the samples generated from the GA module. Reactor designs are usually complex and a simulation needs a significantly large amount time to execute, hence implementation of GA or any other global optimization techniques is not feasible, therefore we present a new method of using rpart in conjunction with GA. Due to using rpart, we do not necessarily need to run the neutronics simulation for all the inputs generated from the GA module rather, run the simulations for a predefined set of inputs, build a regression fit to the input and the output parameters, and then use this fit to predict the output parameters for the inputs generated by GA. The rpart model is implemented as a library using R programming language. The results suggest that the initial breeding ratio tends to increase due to a harder neutron spectrum, however a softer neutron spectrum is desired to limit the

  4. A PANEL REGRESSION ANALYSIS OF HUMAN CAPITAL RELEVANCE IN SELECTED SCANDINAVIAN AND SE EUROPEAN COUNTRIES

    Filip Kokotovic

    2016-06-01

    Full Text Available The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. Aside from the human capital proxy variables, other explanatory variables were selected using stepwise regression while the dependant variable was GDP. This paper concludes that there are significant structural differences in the economies of the two observed panels. Of the human capital proxy variables observed, for the panel of SE European countries only life expectancy was statistically significant and it had a negative impact on economic growth, while in the panel of Scandinavian countries total public expenditure on education had a statistically significant positive effect on economic growth. Based upon these results and existing studies, this paper concludes that human capital has a far more significant impact on economic growth in more developed economies.

  5. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  6. Investigating bias in squared regression structure coefficients

    Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

    2015-01-01

    The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficie...

  7. Uso de regressões logísticas múltiplas para mapeamento digital de solos no Planalto Médio do RS Multiple logistic regression applied to soil survey in rio grande do sul state, Brazil

    Samuel Ribeiro Figueiredo

    2008-12-01

    hydrographic variables (distance to rivers, flow length, topographical wetness index, and stream power index. Multiple logistic regressions were established between the soil classes mapped on the basis of a traditional survey at a scale of 1:80.000 and the land variables calculated using the DEM. The regressions were used to calculate the probability of occurrence of each soil class. The final estimated soil map was drawn by assigning the soil class with highest probability of occurrence to each cell. The general accuracy was evaluated at 58 % and the Kappa coefficient at 38 % in a comparison of the original soil map with the map estimated at the original scale. A legend simplification had little effect to increase the general accuracy of the map (general accuracy of 61 % and Kappa coefficient of 39 %. It was concluded that multiple logistic regressions have a predictive potential as tool of supervised soil mapping.

  8. Guide to using Multiple Regression in Excel (MRCX v.1.1) for Removal of River Stage Effects from Well Water Levels

    Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.

    2010-09-01

    A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.

  9. Analysis of sparse data in logistic regression in medical research: A newer approach

    S Devika

    2016-01-01

    Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell

  10. Application of Multiple Linear Regression Models and Artificial Neural Networks on the Surface Ozone Forecast in the Greater Athens Area, Greece

    Moustris, K. P.; Nastos, P. T.; Larissi, I. K.; A. G. Paliatsos

    2012-01-01

    An attempt is made to forecast the daily maximum surface ozone concentration for the next 24 hours, within the greater Athens area (GAA). For this purpose, we applied Multiple Linear Regression (MLR) models against a forecasting model based on Artificial Neural Network (ANN) approach. The availability of basic meteorological parameters is of great importance in order to forecast the ozone’s concentration levels. Modelling was based on recorded meteorological and air pollution data from thirte...

  11. Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

    Østergaard, Søren,; Ettema, Jehan Frans; Hjortø, Line; Pedersen, Jørn; Kargo, Morten

    2014-01-01

    Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to ...

  12. An application with multinomial logistic regression analysisMultinomiyal logistik regresyon analizi ile bir uygulama

    Sadi Elasan

    2015-01-01

    Full Text Available Multinomial logistic regression analysis is one of the analysis techniques which is used to examine relationships between independent and dependent variables when dependent variable including three or more category. In multinomial logistic regression analysis, any category of dependent variable is considered as reference category and other categories are analyzed with respect to this category. In this study “Multinomial Logistic Regression Analysis” was introduced and an application was done. In the application trauma variable was considered as 4 categories [no abused (0, sexual abused (1, physical abused (2, sexual and physical abused (3] and effects of other variables on trauma were examined. As a result, it can be noted that multinomial logistic regression analysis is applicable for response variable contains 3 or more categories. ÖzetMultinomiyal logistik regresyon analizi, cevap değişkeninin üç veya daha fazla kategori içerdiği durumlarda; bu değişken ile açıklayıcı değişkenler (bağımsız değişkenler arasındaki ilişkiyi belirlemede kullanılan yöntemlerden birisidir. Multinomiyal logistik regresyon analizinde; cevap değişkeninin herhangi bir kategorisi referans kategori olarak alınır ve diğer kategoriler bu referans kategoriye göre analiz edilir. Bu çalışmada, “Multinomiyal Logistik Regresyon Analizi” tanıtılmış ve bir uygulama yapılmıştır. Uygulamada, travma değişkeni, [Travma yok (0, Cinsel travma (1, Fiziksel travma (2, Cinsel ve Fiziksel travma (3] 4 kategorili olarak kodlanmış ve bu değişken üzerine diğer değişkenlerin etkisi incelenmiştir. Sonuçta cevap değişkeninin 3 ve daha fazla kategori içerdiği durumlarda Multinomiyal Logistik Regresyon Analizi yönteminin kullanılabilirliğine dikkat çekilmiştir. 

  13. An Econometric Analysis of Modulated Realised Covariance, Regression and Correlation in Noisy Diffusion Models

    Kinnebrock, Silja; Podolskij, Mark

    This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions...

  14. Classification of Error-Diffused Halftone Images Based on Spectral Regression Kernel Discriminant Analysis

    Zhigao Zeng

    2016-01-01

    Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.

  15. Functional Unfold Principal Component Regression Methodology for Analysis of Industrial Batch Process Data

    Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;

    2016-01-01

    . It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production...

  16. SETUP OF RESOLUTIVE CRITERION FOR SEDIMENT-RELATED DISASTER WARNING INFORMATION USING LOGISTIC REGRESSION ANALYSIS

    Sugihara, Shigemitsu; Shinozaki, Tsuguhiro; Ohishi, Hiroyuki; Araki, Yoshinori; Furukawa, Kohei

    It is difficult to deregulate sediment-related disaster warning information, for the reason that it is difficult to quantify the risk of disaster after the heavy rain. If we can quantify the risk according to the rain situation, it will be an indication of deregulation. In this study, using logistic regression analysis, we quantified the risk according to the rain situation as the probability of disaster occurrence. And we analyzed the setup of resolutive criterion for sediment-related disaster warning information. As a result, we can improve convenience of the evaluation method of probability of disaster occurrence, which is useful to provide information of imminently situation.

  17. Characterization of sonographically indeterminate ovarian tumors with MR imaging. A logistic regression analysis

    Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p≤0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.)

  18. Characterization of sonographically indeterminate ovarian tumors with MR imaging. A logistic regression analysis

    Yamashita, Y. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Hatanaka, Y. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Torashima, M. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Takahashi, M. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Miyazaki, K. [Dept. of Obstetrics and Gynecology, Kumamoto Univ. School of Medicine (Japan); Okamura, H. [Dept. of Obstetrics and Gynecology, Kumamoto Univ. School of Medicine (Japan)

    1997-07-01

    Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p{<=}0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.).

  19. Trigonometric regressive spectral analysis: an innovative tool for evaluating the autonomic nervous system.

    Ziemssen, Tjalf; Reimann, Manja; Gasch, Julia; Rüdiger, Heinz

    2013-09-01

    Biological rhythms, describing the temporal variation of biological processes, are a characteristic feature of complex systems. The analysis of biological rhythms can provide important insights into the pathophysiology of different diseases, especially, in cardiovascular medicine. In the field of the autonomic nervous system, heart rate variability (HRV) and baroreflex sensitivity (BRS) describe important fluctuations of blood pressure and heart rate which are often analyzed by Fourier transformation. However, these parameters are stochastic with overlaying rhythmical structures. R-R intervals as independent variables of time are not equidistant. That is why the trigonometric regressive spectral (TRS) analysis--reviewed in this paper--was introduced, considering both the statistical and rhythmical features of such time series. The data segments required for TRS analysis can be as short as 20 s allowing for dynamic evaluation of heart rate and blood pressure interaction over longer periods. Beyond HRV, TRS also estimates BRS based on linear regression analyses of coherent heart rate and blood pressure oscillations. An additional advantage is that all oscillations are analyzed by the same (maximal) number of R-R intervals thereby providing a high number of individual BRS values. This ensures a high confidence level of BRS determination which, along with short recording periods, may be of profound clinical relevance. The dynamic assessment of heart rate and blood pressure spectra by TRS allows a more precise evaluation of cardiovascular modulation under different settings as has already been demonstrated in different clinical studies. PMID:23812502

  20. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering

    Ussery, David; Bohlin, Jon; Skjerve, Eystein

    2009-01-01

    Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867 dif...... clustering and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level.......Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...

  1. MULTIVARIATE STEPWISE LOGISTIC REGRESSION ANALYSIS ON RISK FACTORS OF VENTILATOR-ASSOCIATED PNEUMONIA IN COMPREHENSIVE ICU

    管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放

    2003-01-01

    Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of

  2. Robust Outlier Detection in Linear Regression

    Nethal K. Jajo; Xizhi Wu

    2004-01-01

    New methodology of robust outlier detection based on Robustly Studentized Robust Residuals (RSRR) examination is well established in linear regression analysis. Two new robust location estimators of linear regression parameters are developed in simple and multiple cases. Based on these robust estimators we obtain RSRR. We used RSRR to derive a new measure of distance to be used in outlier detection. A graphical display using new measure of distance is constructed for detecting multiple outlie...

  3. Automated particle identification through regression analysis of size, shape and colour

    Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.

    2016-04-01

    Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.

  4. Regression analysis of growth responses to water depth in three wetland plant species

    Sorrell, Brian K.; Tanner, Chris C.; Brix, Hans

    2012-01-01

    Background and aims Plant species composition in wetlands and on lakeshores often shows dramatic zonation, which is frequently ascribed to differences in flooding tolerance. This study compared the growth responses to water depth of three species (Phormium tenax, Carex secta and Typha orientalis) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water depths from 0 to 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated-measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth differed between the three species, and were non-linear. Phormium tenax growth decreased rapidly in standing water >0.25 m depth, C. secta growth increased initially with depth but then decreased at depths >0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis was unaffected by the 0- to 0.50-m depth range. In P. tenax the decrease in growth was associated with a decrease in the number of leaves produced per ramet and in C. secta the effect of water depth was greatest for the tallest shoots. Allocation patterns were unaffected by depth. Conclusions The responses are consistent with the principle that zonation in the field is primarily structured by competition in shallow water and by physiological flooding tolerance in deep water. Regression analyses, especially QRA, proved to be powerful tools in distinguishing genuine phenotypic responses to water depth from non-phenotypic variation due to size and developmental differences. PMID:23259044

  5. Canopy Height Estimation in French Guiana with LiDAR ICESat/GLAS Data Using Principal Component Analysis and Random Forest Regressions

    Ibrahim Fayad

    2014-11-01

    Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform

  6. Application of kernel principal component analysis and support vector regression for reconstruction of cardiac transmembrane potentials

    Non-invasively reconstructing the transmembrane potentials (TMPs) from body surface potentials (BSPs) constitutes one form of the inverse ECG problem that can be treated as a regression problem with multi-inputs and multi-outputs, and which can be solved using the support vector regression (SVR) method. In developing an effective SVR model, feature extraction is an important task for pre-processing the original input data. This paper proposes the application of principal component analysis (PCA) and kernel principal component analysis (KPCA) to the SVR method for feature extraction. Also, the genetic algorithm and simplex optimization method is invoked to determine the hyper-parameters of the SVR. Based on the realistic heart-torso model, the equivalent double-layer source method is applied to generate the data set for training and testing the SVR model. The experimental results show that the SVR method with feature extraction (PCA-SVR and KPCA-SVR) can perform better than that without the extract feature extraction (single SVR) in terms of the reconstruction of the TMPs on epi- and endocardial surfaces. Moreover, compared with the PCA-SVR, the KPCA-SVR features good approximation and generalization ability when reconstructing the TMPs.

  7. Digital soil mapping using multiple logistic regression on terrain parameters in southern Brazil Mapeamento digital de solos utilizando regressões logísticas múltiplas e parâmetros do terreno no sul do Brasil

    Elvio Giasson

    2006-06-01

    Full Text Available Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de

  8. Predicting the Surface Quality of Face Milled Aluminium Alloy Using a Multiple Regression Model and Numerical Optimization

    Simunovic, K.; Simunovic, G.; Saric, T.

    2013-10-01

    The surface roughness is a very significant indicator of surface quality. It represents an essential exploitation requirement and influences technological time and costs, i.e. productivity. For that reason, the main objective of this paper is to analyse the influence of face milling cutting parameters (number of revolution, feed rate and depth of cut) on the surface roughness of aluminium alloy. Hence, a statistical (regression) model has been developed to predict the surface roughness by using the methodology of experimental design. Central composite design is chosen for fitting response surface. Also, numerical optimization considering two goals simultaneously (minimum propagation of error and minimum roughness) was performed throughout the experimental region. In this way, the settings of cutting parameters causing the minimum variability in response were determined for the estimated variations of the significant regression factors.

  9. An evaluation of an operating BWR piping system damping during earthquake by applying auto regressive analysis

    The observation of the equipment and piping system installed in an operating nuclear power plant in earthquakes is very umportant for evaluating and confirming the adequacy and the safety margin expected in the design stage. By analyzing observed earthquake records, it can be expected to get the valuable data concerning the behavior of those in earthquakes, and extract the information about the aseismatic design parameters for those systems. From these viewpoints, an earthquake observation system was installed in a reactor building in an operating plant. Up to now, the records of three earthquakes were obtained with this system. In this paper, an example of the analysis of earthquake records is shown, and the main purpose of the analysis was the evaluation of the vibration mode, natural frequency and damping factor of this piping system. Prior to the earthquake record analysis, the eigenvalue analysis for this piping system was performed. Auto-regressive analysis was applied to the observed acceleration time history which was obtained with a piping system installed in an operating BWR. The results of earthquake record analysis agreed well with the results of eigenvalue analysis. (Kako, I.)

  10. A least trimmed square regression method for second level FMRI effective connectivity analysis.

    Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas Martin

    2013-01-01

    We present a least trimmed square (LTS) robust regression method to combine different runs/subjects for second/high level effective connectivity analysis. The basic idea of this method is to treat the extreme nonlinear model variability as outliers if they exceed a certain threshold. A bootstrap method for the LTS estimation is employed to detect model outliers. We compared the LTS robust method with a non-robust method using simulated and real datasets. The difference between LTS and the non-robust method for second level effective connectivity analysis is significant, suggesting the conventional non-robust method is easily affected by the model variability from the first level analysis. In addition, after these outliers are detected and excluded for the high level analysis, the model coefficients of the second level are combined within the framework of a mixed model. The variance of the mixed model is estimated using the Newton-Raphson (NR) type Levenberg-Marquardt algorithm. Three sets of real data are adopted to compare conventional methods which do not include random effects in the analysis with a mixed model for second level effective connectivity analysis. The results show that the conventional method is significantly different from the mixed model when greater model variability exists, suggesting there is a strong random effect, and the mixed model should be employed for the second level effective connectivity analysis. PMID:23093379

  11. Construction of exact simultaneous confidence bands In multiple linear regression with predictor variables constrained In an ellipsoidal region

    Liu, W.; Lin, S.

    2008-01-01

    A simultaneous confidence band provides useful information on the plausible range of the unknown regression model. Construction of a simultaneous confidence band has a history going back to Working and Hotelling (1929) and is often a hard problem when the region over which a confidence band is required is restricted and the number of predictor variables is more than one. This article considers the construction of exact one-sided and two-sided simultaneous confidence bands of hyperbolic shape ...

  12. Comparison between neural networks and multiple logistic regression to predict acute coronary syndrome in the emergency room

    Green, Michael; Björk, Jonas; Hansen, Jakob; Ekelund, Ulf; Edenbrandt, Lars; Ohlsson, Mattias

    2006-01-01

    Summary Objective Patients with suspicion of acute coronary syndrome (ACS) are difficult to diagnose and they represent a very heterogeneous group. Some require immediate treatment while others, with only minor disorders, may be sent home. Detecting ACS patients using a machine learning approach would be advantageous in many situations. Methods and materials Artificial neural network (ANN) ensembles and logistic regression models were trained on data from 634 patients pres...

  13. Analysis of dynamic multiplicity fluctuations at PHOBOS

    Chai, Zhengwei; PHOBOS Collaboration; Back, B. B.; Baker, M. D.; Ballintijn, M.; Barton, D. S.; Betts, R. R.; Bickley, A. A.; Bindel, R.; Budzanowski, A.; Busza, W.; Carroll, A.; Chai, Z.; Decowski, M. P.; García, E.; George, N.; Gulbrandsen, K.; Gushue, S.; Halliwell, C.; Hamblen, J.; Heintzelman, G. A.; Henderson, C.; Hofman, D. J.; Hollis, R. S.; Holynski, R.; Holzman, B.; Iordanova, A.; Johnson, E.; Kane, J. L.; Katzy, J.; Khan, N.; Kucewicz, W.; Kulinich, P.; Kuo, C. M.; Lin, W. T.; Manly, S.; McLeod, D.; Mignerey, A. C.; Nouicer, R.; Olszewski, A.; Pak, R.; Park, I. C.; Pernegger, H.; Reed, C.; Remsberg, L. P.; Reuter, M.; Roland, C.; Roland, G.; Rosenberg, L.; Sagerer, J.; Sarin, P.; Sawicki, P.; Skulski, W.; Steinberg, P.; Stephans, G. S. F.; Sukhanov, A.; Tang, J. L.; Trzupek, A.; Vale, C.; van Nieuwenhuizen, G. J.; Verdier, R.; Wolfs, F. L. H.; Wosiek, B.; Wozniak, K.; Wuosmaa, A. H.; Wyslouch, B.

    2005-01-01

    This paper presents the analysis of the dynamic fluctuations in the inclusive charged particle multiplicity measured by PHOBOS for Au+Au collisions at surdsNN = 200GeV within the pseudo-rapidity range of -3 < η < 3. First the definition of the fluctuations observables used in this analysis is presented, together with the discussion of their physics meaning. Then the procedure for the extraction of dynamic fluctuations is described. Some preliminary results are included to illustrate the correlation features of the fluctuation observable. New dynamic fluctuations results will be available in a later publication.

  14. The Analysis Of The Correlations And Regressions Between Some Characters On A Wheat Isogenic Varities Assortment

    Păniţă, Ovidiu

    2015-09-01

    In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.

  15. Stability and adaptability of runner peanut genotypes based on nonlinear regression and AMMI analysis

    Roseane Cavalcanti dos Santos

    2012-08-01

    Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.

  16. Statistical learning method in regression analysis of simulated positron spectral data

    Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)

  17. Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression

    Vargas-Irwin, Cristina

    2010-06-01

    Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes

  18. Using Spline Regression in Semi-Parametric Stochastic Frontier Analysis: An Application to Polish Dairy Farms

    Czekaj, Tomasz Gerard; Henningsen, Arne

    The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA is...... specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply this...

  19. Factors affecting the outcome of excimer laser photorefractive keratectomy: a preliminary multivariable regression analysis

    Maguen, Ezra I.; Papaioannou, Thanassis; Nesburn, Anthony B.; Salz, James J.; Warren, Cathy; Grundfest, Warren S.

    1996-05-01

    Multivariable regression analysis was used to evaluate the combined effects of some preoperative and operative variables on the change of refraction following excimer laser photorefractive keratectomy for myopia (PRK). This analysis was performed on 152 eyes (at 6 months postoperatively) and 156 eyes (at 12 months postoperatively). The following variables were considered: intended refractive correction, patient age, treatment zone, central corneal thickness, average corneal curvature, and intraocular pressure. At 6 months after surgery, the cumulative R2 was 0.43 with 0.38 attributed to the intended correction and 0.06 attributed to the preoperative corneal curvature. At 12 months, the cumulative R2 was 0.37 where 0.33 was attributed to the intended correction, 0.02 to the preoperative corneal curvature, and 0.01 to both preoperative corneal thickness and to the patient age. Further model augmentation is necessary to account for the remaining variability and the behavior of the residuals.

  20. Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes

    We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system