WorldWideScience

Sample records for multiple regression analysis

  1. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  2. Univariate Nonparametric Analysis of Variance Through Multiple Linear Regression

    Science.gov (United States)

    Huitema, Bradley E.

    1978-01-01

    Many methodologists are aware that parametric tests associated with the analysis of variance and the analysis of covariance can be computed using regression procedures. It is shown that multiple linear regression can also be employed to compute the Kruskal-Wallis nonparametric analysis of variance. (Author)

  3. MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM

    Directory of Open Access Journals (Sweden)

    Erika KULCSÁR

    2009-12-01

    Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.

  4. Applied multiple regression/correlation analysis for the behavioral sciences

    CERN Document Server

    Cohen, Jacob; West, Stephen G

    2013-01-01

    This classic text on multiple regression is noted for its nonmathematical, applied, and data-analytic approach. Readers profit from its verbal-conceptual exposition and frequent use of examples.The applied emphasis provides clear illustrations of the principles and provides worked examples of the types of applications that are possible. Researchers learn how to specify regression models that directly address their research questions. An overview of the fundamental ideas of multiple regression and a review of bivariate correlation and regression and other elementary statistical concepts provide

  5. Applied multiple regression correlation analysis for the behavioral sciences

    CERN Document Server

    Cohen, Patricia; Aiken, Leona S

    2014-01-01

    This classic text on multiple regression is noted for its nonmathematical, applied, and data-analytic approach. Readers profit from its verbal-conceptual exposition and frequent use of examples. The applied emphasis provides clear illustrations of the principles and provides worked examples of the types of applications that are possible. Researchers learn how to specify regression models that directly address their research questions. An overview of the fundamental ideas of multiple regression and a review of bivariate correlation and regression and other elementary statistical concepts provide a strong foundation for understanding the rest of the text. The third edition features an increased emphasis on graphics and the use of confidence intervals and effect size measures, and an accompanying CD with data for most of the numerical examples along with the computer code for SPSS, SAS, and SYSTAT.  Applied Multiple Regression serves as both a textbook for graduate students and as a reference tool for researche...

  6. Factors that determine false recall: a multiple regression analysis.

    Science.gov (United States)

    Roediger, H L; Watson, J M; McDermott, K B; Gallo, D A

    2001-09-01

    In the Deese-Roediger-McDermott (DRM) paradigm, subjects study lists of words that are designed to elicit the recall of an associatively related critical item. The 55 lists we have developed provide levels of false recall ranging from .01 to .65, and understanding this variability should provide a key to understanding this memory illusion. Using a simultaneous multiple regression analysis, we assessed the contribution of seven factors in creating false recall of critical items in the DRM paradigm. This analysis accounted for approximately 68% of the variance in false recall, with two main predictors: associative connections from the study words to the critical item (r = +.73; semipartial r = +.60) and recallability of the lists (r = -.43; semipartial r = -.34). Taken together, the variance in false recall captured by these predictors accounted for 84% of the variance that can be explained, given the reliability of the false recall measures (r = .90). Therefore, the results of this analysis strongly constrain theories of false memory in this paradigm, suggesting that at least two factors determine the propensity of DRM lists to elicit false recall. The results fit well within the theoretical framework postulating that both semantic activation of the critical item and strategic monitoring processes influence the probability of false recall and false recognition in this paradigm. PMID:11700893

  7. Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis

    Science.gov (United States)

    Williams, Ryan T.

    2012-01-01

    Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…

  8. Regression analysis for multiple-disease group testing data.

    Science.gov (United States)

    Zhang, Boan; Bilder, Christopher R; Tebbs, Joshua M

    2013-12-10

    Group testing, where individual specimens are composited into groups to test for the presence of a disease (or other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Group testing data are unique in that only group responses may be available, but inferences are needed at the individual level. A further methodological challenge arises when individuals are tested in groups for multiple diseases simultaneously, because unobserved individual disease statuses are likely correlated. In this paper, we propose new regression techniques for multiple-disease group testing data. We develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures. We apply our proposed methodology to chlamydia and gonorrhea screening data collected in Nebraska as part of the Infertility Prevention Project and to prenatal infectious disease screening data from Kenya. PMID:23703944

  9. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  10. Analysis of ? spectra in airborne radioactivity measurements using multiple linear regressions

    International Nuclear Information System (INIS)

    This paper describes the net peak counts calculating of nuclide 137Cs at 662 keV of ? spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  11. A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

    International Nuclear Information System (INIS)

    A technique for accurate background subtraction in 99Tcm-DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)

  12. Quantitative electron microscope autoradiography: application of multiple linear regression analysis

    International Nuclear Information System (INIS)

    A new method for the analysis of high resolution EM autoradiographs is described. It identifies labelled cell organelle profiles in sections on a strictly statistical basis and provides accurate estimates for their radioactivity without the need to make any assumptions about their size, shape and spatial arrangement. (author)

  13. Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis

    Science.gov (United States)

    Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia

    2015-03-01

    The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.

  14. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  15. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    OpenAIRE

    Chauhan, R. K.; Abhishek Taneja

    2011-01-01

    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample si...

  16. Multiple regression analysis of Jominy hardenability data for boron treated steels

    International Nuclear Information System (INIS)

    The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

  17. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  18. Assessing Credit Default using Logistic Regression and Multiple Discriminant Analysis: Empirical Evidence from Bosnia and Herzegovina

    Directory of Open Access Journals (Sweden)

    Deni Memi?

    2015-01-01

    Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.

  19. QSPR study of molar diamagnetic susceptibility of diverse organic compounds using multiple linear regression analysis

    OpenAIRE

    S . Saaidpour; S . A . Zarei; F. Nasri

    2012-01-01

    The multiple linear regression (MLR) was used to build the linear quantitative structure-property relationship (QSPR) model for the prediction of the molar diamagnetic susceptibility (?m)for 140 diverse organic compounds using the three significant descriptors calculated from the molecular structures alone and selected by stepwise regression method. Stepwise regression was employed to develop a regression equation based on 100training compounds, and predictive ability was tested on 40 compoun...

  20. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    Directory of Open Access Journals (Sweden)

    R.K.Chauhan

    2011-04-01

    Full Text Available The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE, R-square, R-Square adjusted, condition number, root mean square error(RMSE, number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear regression. But the absolute value of prediction accuracy varied between the three datasets indicating that the data distribution and data characteristics play a major role in choosing the correct prediction technique.

  1. Prediction of Persian Gulf Sea Surface Temperature Using Multiple Regressions and Principal Components Analysis

    Directory of Open Access Journals (Sweden)

    A. Shirvani

    2005-10-01

    Full Text Available Since the fluctuations of the Persian Gulf Sea Surface Temperature (PGSST have a significant effect on the winter precipitation and water resources and agricultural productions of the south western parts of Iran, the possibility of the Winter SST prediction was evaluated by multiple regression model. The time series of PGSSTs for all seasons, during 1947-1992, were considered as predictors, and the time series of MSSTs during 1948-1993, as the prrdictand. For the purpose of data reduction and principal components extraction, the principal components analysis was applied. Just the scores of the first four PCs (PC1 to PC4 that accounted for the total variance in predictor field were considered as the input file for the regression analysis. For finding the dependency of each principal component to the first time series of the PGSST, the Varimax rotation analysis was applied. The results have indicated that PC1 to PC4 respectively are the indicator of temperature changes during winter, autumn, Spring and Summer. According to the regression model, the components of PC1, PC2 and PC4 were significant at 5% level. But the components of PC3 was insignificant. The results indicated that the significant variables are held accountable for the 33.5% of the total variance in the winter PGSSTs. It became obvious that for the prediction of the winter PGSST, the PGSST during the winter of the last year has a particular importance. At the next stage, autumn and summer temperature have also a role in prediction of winter PGSST.

  2. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    CERN Document Server

    Taneja, Abhishek

    2011-01-01

    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...

  3. Multiple linear regression solvatochromic analysis of donar-acceptor imidazole derivatives.

    Science.gov (United States)

    Jayabharathi, J; Thanikachalam, V; Kalaiarasi, V; Ramanathan, P

    2015-01-01

    Catalytic synthesis of some polysubstituted imidazoles under solvent-free condition is reported and their characterization has been carried out spectral techniques. Electronic spectral studies reveal that their solvatochromic behavior depends both the polarity of the medium and hydrogen bonding properties of the solvents. Specific hydrogen bonding interaction in polar solvents modulated the order of the two close lying lowest singlet states. The solvent effect on absorption and emission spectral results has been analyzed by multiple parametric regression analysis. Solvatochromic effects on the emission spectral position indicate the charge transfer (CT) character of the emitting singlet states both in a polar and a non polar environment. The fluorescence decays for the imidazoles fit satisfactorily to a bi exponential kinetics. These observations are in consistent with quantum chemical calculations. PMID:25595056

  4. Thermodynamic analysis of simple gas turbine cycle with multiple regression modelling and optimization

    International Nuclear Information System (INIS)

    In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature), PR (Pressure Ratio) and TIT (Turbine Inlet Temperature) on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic) with the predictor variables (operating parameters). The regression model equations showed a significant statistical relationship between the predictor and response variables. (author)

  5. Establishing multiple regression models for ozone sensitivity analysis to temperature variation in Taiwan

    Science.gov (United States)

    Liu, Pao-Wen Grace; Tsai, Jiun-Horng; Lai, Hsin-Chih; Tsai, Der-Min; Li, Li-Wei

    2013-11-01

    Sensitivity of meteorological variation to air quality has attracted people's attention since climate change became a world issue. The goal of this study is to investigate the sensitivity of ground-level ozone concentrations to temperature variation in Taiwan. Several multivariate regression models were built based on historical data of ozone and meteorological variables at three cities located in northern, mid-western, and southern Taiwan. Results of descriptive statistics indicate that the severe pollution from the highest to the minor conditions following by the order of the southern (Pingtung), mid-western (Fengyuan), and the northern sites (Hsichih). Multiple regression models containing a principal component trigger variable effectively simulated the historical ozone exceedance during 2004-2009. Inclusion of the PC trigger were improved R2 from the lowest 0.38 to the highest 0.58. High probability of detection and critical success index (mostly between 85% and 90%) and low false alarm rates (0-2.6%) were achieved for predicting the high ozone days (?100 ppb). The results of sensitivity analysis indicated that (1) the ozone sensitivity was positively correlated with the temperature variation, (2) the sensitivity levels were opposite to that of the ozone problem severity, (3) the sensitivity was mostly apparent in ozone seasons, and (4) the sensitivity strongly depended on the seasonality in the urban cities Hischih and Fengyuan, but weakly depended on seasonality in the rural city Pingtung.

  6. Analysis of longitudinal clinical trials with missing data using multiple imputation in conjunction with robust regression.

    Science.gov (United States)

    Mehrotra, Devan V; Li, Xiaoming; Liu, Jiajun; Lu, Kaifeng

    2012-12-01

    In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood-based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M-estimation; Huber, 1973, Annals of Statistics 1, 799-821.) to protect against potential non-normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987, Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000, Biometrika 87, 113-124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non-normal distributions. A clinical trial example is used for illustration. PMID:22994905

  7. Influence of plant root morphology and tissue composition on phenanthrene uptake: Stepwise multiple linear regression analysis

    International Nuclear Information System (INIS)

    Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake

  8. Multiple Logistic Regression Analysis of Cigarette Use among High School Students

    Science.gov (United States)

    Adwere-Boamah, Joseph

    2011-01-01

    A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…

  9. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  10. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    International Nuclear Information System (INIS)

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  11. Precision Efficacy Analysis for Regression.

    Science.gov (United States)

    Brooks, Gordon P.

    When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…

  12. Multiple imputation in quantile regression

    OpenAIRE

    WEI, YING; Ma, Yanyuan; Carroll, Raymond J.

    2012-01-01

    We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance o...

  13. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    OpenAIRE

    Ani Shabri; Ruhaidah Samsudin

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing...

  14. Multiple imputation in quantile regression.

    Science.gov (United States)

    Wei, Ying; Ma, Yanyuan; Carroll, Raymond J

    2012-01-01

    We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American's Table Study data, investigating the association between two measures of dietary intake. PMID:24944347

  15. Multiple Regressions in Analysing House Price Variations

    Directory of Open Access Journals (Sweden)

    Aminah Md Yusof

    2012-03-01

    Full Text Available An application of rigorous statistical analysis in aiding investment decision making gains momentum in the United States of America as well as the United Kingdom. Nonetheless in Malaysia the responses from the local academician are rather slow and the rate is even slower as far as the practitioners are concern. This paper illustrates how Multiple Regression Analysis (MRA and its extension, Hedonic Regression Analysis been used in explaining price variation for selected houses in Malaysia. Each attribute that theoretically identified as price determinant is priced and the perceived contribution of each is explicitly shown. The paper demonstrates how the statistical analysis is capable of analyzing property investment by considering multiple determinants. The consideration of various characteristics which is more rigorous enables better investment decision making.

  16. Application of multiple regression analysis to forecasting South Africa's electricity demand

    Scientific Electronic Library Online (English)

    Renee, Koen; Jennifer, Holloway.

    2014-11-01

    Full Text Available In a developing country such as South Africa, understanding the expected future demand for electricity is very important in various planning contexts. It is specifically important to understand how expected scenarios regarding population or economic growth can be translated into corresponding future [...] electricity usage patterns. This paper discusses a methodology for forecasting long-term electricity demand that was specifically developed for applying to such scenarios. The methodology uses a series of multiple regression models to quantify historical patterns of electricity usage per sector in relation to patterns observed in certain economic and demographic variables, and uses these relationships to derive expected future electricity usage patterns. The methodology has been used successfully to derive forecasts used for strategic planning within a private company as well as to provide forecasts to aid planning in the public sector. This paper discusses the development of the modelling methodology, provides details regarding the extensive data collection and validation processes followed during the model development, and reports on the relevant model fit statistics. The paper also shows that the forecasting methodology has to some extent been able to match the actual patterns, and therefore concludes that the methodology can be used to support planning by translating changes relating to economic and demographic growth, for a range of scenarios, into a corresponding electricity demand. The methodology therefore fills a particular gap within the South African long-term electricity forecasting domain.

  17. Use of Factor Analysis Scores in Multiple Regression Model for Estimation of Body Weight from Some Body Measurement in Muscovy Duck

    OpenAIRE

    D.M. Ogah; A.A. Alaga; M.O. Momoh

    2009-01-01

    Factor and multiple regression analysis were carried out on morphological traits (body length, body width, bill length, bill width, bill height, shank length, body height, head length, head width, neck length, wing length, chest circumference and body weight) of male and female muscovy ducks. Obvious sexual dimorphism was exhibited between sexes, relationship between body measurement and body weight were examined through factor and multiple linear regression analysis. Three factors had positi...

  18. Fitting multiplicative models by robust alternating regressions

    OpenAIRE

    Croux, Christophe; Filzmoser, P.; Pison, G; Rousseeuw, Peter

    2003-01-01

    In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from...

  19. Multiple regression modeling of nonlinear data sets

    Science.gov (United States)

    Kravtsov, S.; Kondrashov, D.; Ghil, M.

    2003-04-01

    Application of multiple polynomial regression modeling to observational and model generated data sets is discussed. Here the form of classical multiple linear regression is generalized to a model that is still linear in its parameters, but includes general multivariate polynomials of predictor variables as the basis functions. The system's low-frequency evolution is assumed to be the result of deterministic, possibly nonlinear, dynamics excited by a temporally white, but geographically coherent and normally distributed white noise. In determining the appropriate structure of the latter, the multi-level generalization of multiple polynomial regression, where the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this, and all preceding levels, has turned out to be useful. The number of levels is determined so that lag-0 covariance of the residual forcing converges to a constant matrix, while its lag-1 covariance vanishes. The method has been applied to the output from a three-layer quasi-geostrophic model, to the analysis of the Northern Hemisphere wintertime geopotential height anomalies, and to global sea-surface temperature (SST) data. In the former two cases, the nonlinear multi-regime structure of probability density function (PDF) constructed in the phase subspace of a few leading empirical orthogonal functions (EOFs), as well as the detailed spectrum of the data's temporal evolution, have been well reproduced by the regression simulations. We have given a simple dynamical interpretation of these results in terms of synoptic-eddy feedback on the system's low-frequency variability. In modeling of SST data, a simple way to include the seasonal cycle into the regression model has been developed. The regression simulation in this case produces ENSO events with maximum amplitude in December/January, while the positive events generally tend to have a larger amplitude than the negative events -- a feature that cannot be adequately represented in linear models. The method is expected to work well provided a sample of data that is long enough. For short data records, such as SST record above, the wealth of techniques exists to improve the accuracy of the regression fit; the so-called partial least-square fit turns out to be most useful. The extreme numerical efficiency and ease of interpretation make multi-level multiple polynomial regression an appealing tool for dynamical analysis of geophysical data.

  20. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression.

    Science.gov (United States)

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; He, Hao; Deng, Hong-Wen; Wang, Yu-Ping

    2014-01-01

    A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies. PMID:25364766

  1. Multiple regression and beyond an introduction to multiple regression and structural equation modeling

    CERN Document Server

    Keith, Timothy Z

    2014-01-01

    Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--a

  2. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    Science.gov (United States)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  3. REVAAM Model to determine a company's value by multiple valuation and linear regression analysis

    OpenAIRE

    Luis G. Acosta-Calzado; Humberto Murrieta-Romo; Carlos Acosta-Calzado

    2010-01-01

    This paper shows an alternative model to the widely used method of multiple valuation (or relative valuation) in order to calculate the value of a company by using either the Price Earnings (PE) and/or the Enterprise Value to Earnings Before Interest, Taxes, Depreciation and Amortization (EV/EBITDA). When calculating multiples, analysts tend to consider average multiples within an industry and apply them directly to the target company; however, we believe that ...

  4. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Science.gov (United States)

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  5. Comparison of a neural network with multiple linear regression for quantitative analysis in ICP-atomic emission spectroscopy

    International Nuclear Information System (INIS)

    A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S+ of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)

  6. Modeling the energy content of combustible ship-scrapping waste at Alang-Sosiya, India, using multiple regression analysis.

    Science.gov (United States)

    Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K

    2005-01-01

    Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste. PMID:16009310

  7. Robust analysis of the central tendency, simple and multiple regression and ANOVA: A step by step tutorial

    Directory of Open Access Journals (Sweden)

    Delphine S. Courvoisier

    2010-01-01

    Full Text Available After much exertion and care to run an experiment in social science, the analysis of data should not be ruined by an improper analysis. Often, classical methods, like the mean, the usual simple and multiple linear regressions, and the ANOVA require normality and absence of outliers, which rarely occurs in data coming from experiments. To palliate to this problem, researchers often use some ad-hoc methods like the detection and deletion of outliers. In this tutorial, we will show the shortcomings of such an approach. In particular, we will show that outliers can sometimes be very difficult to detect and that the full inferential procedure is somewhat distorted by such a procedure. A more appropriate and modern approach is to use a robust procedure that provides estimation, inference and testing that are not influenced by outlying observations but describes correctly the structure for the bulk of the data. It can also give diagnostic of the distance of any point or subject relative to the central tendency. Robust procedures can also be viewed as methods to check the appropriateness of the classical methods. To provide a step-by-step tutorial, we present descriptive analyses that allow researchers to make an initial check on the conditions of application of the data. Next, we compare classical and robust alternatives to ANOVA and regression and discuss their advantages and disadvantages. Finally, we present indices and plots that are based on the residuals of the analysis and can be used to determine if the conditions of applications of the analyses are respected. Examples on data from psychological research illustrate each of these points and for each analysis and plot, R code is provided to allow the readers to apply the techniques presented throughout the article

  8. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  9. Multiple Output Regression with Latent Noise

    OpenAIRE

    Gillberg, Jussi; Marttinen, Pekka; Pirinen, Matti; Kangas, Antti J.; Soininen, Pasi; Ali, Mehreen; Havulinna, Aki S.; Järvelin, Marjo-Riitta Marjo-Riitta; Ala-Korpela, Mika; Kaski, Samuel

    2014-01-01

    In high-dimensional data, structured noise, caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in...

  10. Multiple Linear Regression Models in Outlier Detection

    Directory of Open Access Journals (Sweden)

    S.M.A.Khaleelur Rahman

    2012-02-01

    Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.

  11. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis?

    OpenAIRE

    Greensmith, David J.

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of...

  12. A Dirty Model for Multiple Sparse Regression

    OpenAIRE

    Jalali, Ali; Ravikumar, Pradeep; Sanghavi, Sujay

    2011-01-01

    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent researc...

  13. DART: Dropouts meet Multiple Additive Regression Trees

    OpenAIRE

    Rashmi, K. V.; Gilad-Bachrach, Ran

    2015-01-01

    Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen da...

  14. Computing multiple-output regression quantile regions.

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Ro?. 56, ?. 4 (2012), s. 840-853. ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf

  15. Relating relapse and T2 lesion changes to disability progression in multiple sclerosis: a systematic literature review and regression analysis

    Science.gov (United States)

    2013-01-01

    Background In the treatment of multiple sclerosis (MS), the most important therapeutic aim of disease-modifying treatments (DMTs) is to prevent or postpone long-term disability. Given the typically slow progression observed in the majority of relapsing-remitting MS (RRMS) patients, the primary endpoint for most randomized clinical trials (RCTs) is a reduction in relapse rate. It is widely assumed that reducing relapse rate will slow disability progression. Similarly, MRI studies suggest that reducing T2 lesions will be associated with slowing long-term disability in MS. The objective of this study was to evaluate the relationship between treatment effects on relapse rates and active T2 lesions to differences in disease progression (as measured by the Expanded Disability Status Scale [EDSS]) in trials evaluating patients with clinically isolated syndrome (CIS), RRMS, and secondary progressive MS (SPMS). Methods A systematic literature review was conducted in Medline, Embase, CENTRAL, and PsycINFO to identify randomized trials published in English from January 1, 1993-June 3, 2013 evaluating DMTs in adult MS patients using keywords for CIS, RRMS, and SPMS combined with keywords for relapse and recurrence. Eligible studies were required to report outcomes of relapse and T2 lesion changes or disease progression in CIS, RRMS, or SPMS patients receiving DMTs and have a follow-up duration of at least 22 months. Ultimately, 40 studies satisfied these criteria for inclusion. Regression analyses were conducted on RCTs to relate differences between the effect of treatments on relapse rates and on active T2 lesions to differences between the effects of treatments on disease progression (as measured by EDSS). Results Regression analysis determined there is a substantive clinically and statistically significant association between concurrent treatment effects in relapse rate and EDSS; p?

  16. Regression analysis in quantum language

    OpenAIRE

    Ishikawa, Shiro

    2014-01-01

    Although regression analysis has a great history, we consider that it has always continued being confused. For example, the fundamental terms in regression analysis (e.g., "regression", "least-squares method", "explanatory variable", "response variable", etc.) seem to be historically conventional, that is, these words do not express the essence of regression analysis. Recently, we proposed quantum language (or, classical and quantum measurement theory), which is characterize...

  17. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis

    OpenAIRE

    Yoo, Yun Joo; Sun, Lei; Bull, Shelley B.

    2013-01-01

    Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SN...

  18. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    Science.gov (United States)

    Kokaly, R.F.; Clark, R.N.

    1999-01-01

    We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.301 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.

  19. Multiple Outliers Detection Procedures in Linear Regression

    OpenAIRE

    Robiah Adnan; Mohd Nor Mohamad; Halim Setan

    2003-01-01

    This paper describes a procedure for identifying multiple outliers in linear regression. This procedure uses a robust fit which is the least of trimmed of squares (LTS) and the single linkage clustering method to obtain the potential outliers. Then multiple-case diagnostics are used to obtain the outliers from these potential outliers. The performance of this procedure is also compared to Serbert’s method. Monte Carlo simulations are used in determining which procedure performed best in all o...

  20. On directional multiple-output quantile regression.

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2011-01-01

    Ro?. 102, ?. 2 (2011), s. 193-212. ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant ostatní: Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http:// library .utia.cas.cz/separaty/2011/SI/siman-0364128.pdf

  1. Flexible Model Selection Criterion for Multiple Regression

    OpenAIRE

    Kunio Takezawa

    2012-01-01

    Predictors of a multiple linear regression equation selected by GCV (Generalized Cross Validation) may contain undesirable predictors with no linear functional relationship with the target variable, but are chosen only by accident. This is because GCV estimates prediction error, but does not control the probability of selecting irrelevant predictors of the target variable. To take this possibility into account, a new statistics “GCVf

  2. A general framework for multiple linear regression

    OpenAIRE

    Blanco, Víctor; Puerto, Justo; Salmerón, Román

    2015-01-01

    This paper presents a family of new methods for estimating the coefficients in multiple linear regression models. The novelty consists in considering distance-based residuals instead of the usual vertical distance and on the use of different forms of aggregation criteria for those residuals. The most popular methods found in the specialized literature can be cast within this family as particular choices of the residuals and the aggregation criteria. Mathematical programming ...

  3. A Dirty Model for Multiple Sparse Regression

    CERN Document Server

    Jalali, Ali; Sanghavi, Sujay

    2011-01-01

    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...

  4. MODELLING THE EFFECT OF THE TREATMENT MEDIUM PH ON THE HEAT INACTIVATION OF ENTEROCOCCUS FAECIUM USING MULTIPLE REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    S. CONDON

    2014-06-01

    Full Text Available The thermal inactivation of Enterococcus faecium under isothermal conditions in tryptic soy broth of different pH (4.0, 5.5 and 7.4 was studied. The bacterial cells were more sensitive at higher temperature and in media of low pH. Decimal reduction times at 71ºC were 2.56, 0.39 and 0.03 min at pH 7.4, 5.5 and 4.0 respectively. At all temperatures and pH assayed, the survival curves obtained were linear. A mathematical model based on the first order kinetic accurately described these survival curves. The relationship between DT values and temperature was also linear. A mean z-value of 5ºC was established. A multiple linear regression model using four predictor variables (pH, T, pH2 and T2 related the Log of DT value with pH and treatment temperature. The developed tertiary model satisfactorily predicted the heat inactivation of Enterococcus faeciumunder the treatment conditions investigated.

  5. Estimating the input function non-invasively for FDG-PET quantification with multiple linear regression analysis: simulation and verification with in vivo data

    International Nuclear Information System (INIS)

    A novel statistical method, namely Regression-Estimated Input Function (REIF), is proposed in this study for the purpose of non-invasive estimation of the input function for fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) quantitative analysis. We collected 44 patients who had undergone a blood sampling procedure during their FDG-PET scans. First, we generated tissue time-activity curves of the grey matter and the whole brain with a segmentation technique for every subject. Summations of different intervals of these two curves were used as a feature vector, which also included the net injection dose. Multiple linear regression analysis was then applied to find the correlation between the input function and the feature vector. After a simulation study with in vivo data, the data of 29 patients were applied to calculate the regression coefficients, which were then used to estimate the input functions of the other 15 subjects. Comparing the estimated input functions with the corresponding real input functions, the averaged error percentages of the area under the curve and the cerebral metabolic rate of glucose (CMRGlc) were 12.13±8.85 and 16.60±9.61, respectively. Regression analysis of the CMRGlc values derived from the real and estimated input functions revealed a high correlation (r=0.91). No significant difference was found between the real CMRGlc and that derived from our regression-estimated input function (Student's t test, P>0.05). The proposed REIF method demonstrated good abilities for input function and CMRGlc estimation, and represents a reliable replacement for the blood sampling procedures in FDG-PET quantification. (orig.)

  6. A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

    Directory of Open Access Journals (Sweden)

    Hukharnsusatrue, A.

    2005-11-01

    Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than RRR and RL methods when the level of correlations is low or median and sample sizes are large.The AMSE varies with, most to least, respectively, error of restrictions, level of correlations, standard deviation and number of independent variables but inversely with to sample sizes, except that error of restrictions does not affect AMSE of OLS method.

  7. A comparison between Joint Regression Analysis and the Additive Main and Multiplicative Interaction model: the robustness with increasing amounts of missing data

    Scientific Electronic Library Online (English)

    Paulo Canas, Rodrigues; Dulce Gamito Santinhos, Pereira; João Tiago, Mexia.

    2011-12-01

    Full Text Available This paper joins the main properties of joint regression analysis (JRA), a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI) model. The study compares JRA and AMMI with particular focus on robust [...] ness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group) conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA) and winner of mega-environments (AMMI) for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.

  8. Multiple Linear Regression Models in Outlier Detection

    OpenAIRE

    S.M.A.Khaleelur Rahman; M. Mohamed Sathik; K. Senthamarai Kannan

    2012-01-01

    Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our ai...

  9. Survival analysis and regression models.

    Science.gov (United States)

    George, Brandon; Seals, Samantha; Aban, Inmaculada

    2014-08-01

    Time-to-event outcomes are common in medical research as they offer more information than simply whether or not an event occurred. To handle these outcomes, as well as censored observations where the event was not observed during follow-up, survival analysis methods should be used. Kaplan-Meier estimation can be used to create graphs of the observed survival curves, while the log-rank test can be used to compare curves from different groups. If it is desired to test continuous predictors or to test multiple covariates at once, survival regression models such as the Cox model or the accelerated failure time model (AFT) should be used. The choice of model should depend on whether or not the assumption of the model (proportional hazards for the Cox model, a parametric distribution of the event times for the AFT model) is met. The goal of this paper is to review basic concepts of survival analysis. Discussions relating the Cox model and the AFT model will be provided. The use and interpretation of the survival methods model are illustrated using an artificially simulated dataset. PMID:24810431

  10. Dimension reduction of the explanatory variables in multiple linear regression

    OpenAIRE

    Filzmoser, P.; Croux, Christophe

    2003-01-01

    Abstract: In classical multiple linear regression analysis problems will occur if the regressors are either multicollinear or if the number of regressors is larger than the number of observations. In this note a new method is introduced which constructs orthogonal predictor variables in a way to have a maximal correlation with the dependent variable. The predictor variables are linear combinations of the original regressors. This method allows a major reduction of the number of predictors ...

  11. Shrinkage Estimation and Selection for Multiple Functional Regression

    OpenAIRE

    Lian, Heng

    2011-01-01

    Functional linear regression is a useful extension of simple linear regression and has been investigated by many researchers. However, functional variable selection problems when multiple functional observations exist, which is the counterpart in the functional context of multiple linear regression, is seldom studied. Here we propose a method using group smoothly clipped absolute deviation penalty (gSCAD) which can perform regression estimation and variable selection simulta...

  12. Multiple Linear Regression Analysis Indicates Association of P-Glycoprotein Substrate or Inhibitor Character with Bitterness Intensity, Measured with a Sensor.

    Science.gov (United States)

    Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo

    2015-09-01

    P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association J Pharm Sci 104:2789-2794, 2015. PMID:25545612

  13. Estimation of toxicity of ionic liquids in Leukemia Rat Cell Line and Acetylcholinesterase enzyme by principal component analysis, neural networks and multiple lineal regressions.

    Science.gov (United States)

    Torrecilla, José S; García, Julián; Rojo, Ester; Rodríguez, Francisco

    2009-05-15

    Multiple linear regression (MLR), radial basis network (RB), and multilayer perceptron (MLP) neural network (NN) models have been explored for the estimation of toxicity of ammonium, imidazolium, morpholinium, phosphonium, piperidinium, pyridinium, pyrrolidinium and quinolinium ionic liquid salts in the Leukemia Rat Cell Line (IPC-81) and Acetylcholinesterase (AChE) using only their empirical formulas (elemental composition) and molecular weights. The toxicity values were estimated by means of decadic logarithms of the half maximal effective concentration (EC(50)) in microM (log(10)EC(50)). The model's performances were analyzed by statistical parameters, analysis of residuals and central tendency and statistical dispersion tests. The MLP model estimates the log(10)EC(50) in IPC-81 and AchE with a mean prediction error less than 2.2 and 3.8%, respectively. PMID:18805639

  14. Synthesis analysis of regression models with a continuous outcome

    OpenAIRE

    Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin

    2009-01-01

    To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method...

  15. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    Science.gov (United States)

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E

    2014-06-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. PMID:24442792

  16. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components

    DEFF Research Database (Denmark)

    Riccardi, M.; Mele, G.

    2014-01-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R 2) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.

  17. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  18. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    Science.gov (United States)

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  19. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.

  20. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  1. A multiple covariance approach to PLS regression with several predictor groups: Structural Equation Exploratory Regression

    OpenAIRE

    Bry, Xavier; Verron, Thomas; Cazes, Pierre

    2008-01-01

    A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in gr...

  2. Spatial regression analysis on 32 years total column ozone data

    OpenAIRE

    J. S. Knibbe; R. J. van der A; de Laat, A. T. J.

    2014-01-01

    Multiple-regressions analysis have been performed on 32 years of total ozone column data that was spatially gridded with a 1° × 1.5° resolution. The total ozone data consists of the MSR (Multi Sensor Reanalysis; 1979–2008) and two years of assimilated SCIAMACHY ozone data (2009–2010). The two-dimensionality in this data-set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory pow...

  3. A Spreadsheet Model for Teaching Regression Analysis.

    Science.gov (United States)

    Wood, William C.; O'Hare, Sharon L.

    1992-01-01

    Presents a spreadsheet model that is useful in introducing students to regression analysis and the computation of regression coefficients. Includes spreadsheet layouts and formulas so that the spreadsheet can be implemented. (Author)

  4. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  5. Application of Partial Least-Squares Regression Model on Temperature Analysis and Prediction of RCCD

    OpenAIRE

    Yuqing Zhao; Zhenxian Xing

    2013-01-01

    This study, based on the temperature monitoring data of jiangya RCCD, uses principle and method of partial least-squares regression to analyze and predict temperature variation of RCCD. By founding partial least-squares regression model, multiple correlations of independent variables is overcome, organic combination on multiple linear regressions, multiple linear regression and canonical correlation analysis is achieved. Compared with general least-squares regression model result, it is more ...

  6. Vehicle Travel Time Predication based on Multiple Kernel Regression

    Directory of Open Access Journals (Sweden)

    Wenjing Xu

    2014-07-01

    Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.

  7. Teasing out the effect of tutorials via multiple regression

    Science.gov (United States)

    Chasteen, Stephanie V.

    2012-02-01

    We transformed an upper-division physics course using a variety of elements, including homework help sessions, tutorials, clicker questions with peer instruction, and explicit learning goals. Overall, the course transformations improved student learning, as measured by our conceptual assessment. Since these transformations were multi-faceted, we would like to understand the impact of individual course elements. Attendance at tutorials and homework help sessions was optional, and occurred outside the class environment. In order to identify the impact of these optional out-of-class sessions, given self-selection effects in student attendance, we performed a multiple regression analysis. Even when background variables are taken into account, tutorial attendance is positively correlated with student conceptual understanding of the material - though not with performance on course exams. Other elements that increase student time-on-task, such as homework help sessions and lectures, do not achieve the same impacts.

  8. On relationship between regression models and interpretation of multiple regression coefficients

    OpenAIRE

    Varaksin, A. N.; Panov, V. G.

    2012-01-01

    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old var...

  9. REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL

    Directory of Open Access Journals (Sweden)

    Barbu Bogdan POPESCU

    2013-02-01

    Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.

  10. Local bilinear multiple-output quantile/depth regression

    OpenAIRE

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Šiman, Miroslav

    2015-01-01

    A new quantile regression concept, based on a directional version of Koenker and Bassett's traditional single-output one, has been introduced in [Ann. Statist. (2010) 38 635-669] for multiple-output location/linear regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear (actually, bilinear)...

  11. Local Constant and Local Bilinear Multiple-Output Quantile Regression

    OpenAIRE

    Hallin, Marc; Lu, Zudi; Paindaveine, Davy; Siman, Miroslav

    2012-01-01

    A new quantile regression concept, based on a directional version of Koenker and Bassett’s traditional single-output one, has been introduced in [Hallin, Paindaveine and ¡Siman, Annals of Statistics 2010, 635-703] for multiple-output regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear versions of those contours,...

  12. Computing multiple-output regression quantile regions from projection quantiles.

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Ro?. 27, ?. 1 (2012), s. 29-49. ISSN 0943-4062 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : directional quantile * halfspace depth * multiple-output regression * parametric programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 0.482, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376414.pdf

  13. A Solution to Separation and Multicollinearity in Multiple Logistic Regression

    OpenAIRE

    Shen, Jianzhao; Gao, Sujuan

    2008-01-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was ...

  14. Significant Tests of Coefficient Multiple Regressions by using Permutation Methods

    OpenAIRE

    Ali Shadrokh

    2011-01-01

    Tests of significance of a single partial regression coefficient in a multiple regression model are often made in situations where the standard assumptions underlying the probability calculation (for example assumption of normally of random error term) do not hold. When the random error term fails to fulfill some of these assumptions, one need resort to some other nonparametric methods to carry out statistical inferences. Permutation methods are a branch of nonparametric methods. This study c...

  15. On directional multiple-output quantile regression

    OpenAIRE

    Paindaveine, Davy; Siman, Miroslav

    2009-01-01

    This paper sheds some new light on the multivariate (projectional) quantiles recently introduced in Kong and Mizera (2008). Contrary to the sophisticated set analysis used there, we adopt a more parametric approach and study the subgradient conditions associated with these quantiles. In this setup, we introduce Lagrange multipliers which can be interpreted in various interesting ways. We also link these quantiles with portfolio optimization and present an alternative proof that the resulting ...

  16. A multiple covariance approach to PLS regression with several predictor groups: Structural Equation Exploratory Regression

    CERN Document Server

    Bry, Xavier; Cazes, Pierre

    2008-01-01

    A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in groups and investigate the linear model of the components. SEER uses the multidimensional structure of each group. An application example is given.

  17. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  18. Regression Analysis and the Sociological Imagination

    Science.gov (United States)

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  19. Dynamic Population Structure based PSO with Granular Computing for Unified Multiple Linear Regression

    OpenAIRE

    Chen Su-Fen

    2013-01-01

    Unified Multiple Linear Regression (UMLR) is a nonlinear programming model that unifies all kind of multiple linear regression models, such as Principal Components Regression, Ridge Regression, Robust Regression and constrained regression. Although, UMLR has exhibited excellent performances in some real applications, the optimization procedure is not satisfying yet. This study proposes a novel Granular Computing-Particle Swarm Optimization (Grc-PSO) algorithm by ...

  20. A Comparison between the Linear Neural Network Method and the Multiple Linear Regression Method in the Modeling of Continuous Data

    OpenAIRE

    Guoli Wang; Jianhui Wu; Jianhua Wu; Xiaohong Wang

    2011-01-01

    Both linear neural network and multiple linear regression models can be used for multi-factor analysis and forecasting, but the data of the multiple linear regression model are required to meet such conditions as independence and normality, while the data of the linear neural network are only required to have a linear relationship. This article uses the same set of data to establish respectively a linear neural network model and a multiple linear regression model, compares the abilities of fi...

  1. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Science.gov (United States)

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

    2012-01-01

    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  2. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    Science.gov (United States)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  3. A comparative analysis of the effects of instructional design factors on student success in e-learning: multiple-regression versus neural networks

    Directory of Open Access Journals (Sweden)

    Halil Ibrahim Cebeci

    2009-12-01

    Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.

  4. Functional Data Analysis of Generalized Quantile Regressions

    OpenAIRE

    Guo, Mengmeng; Zhou, Lhan; Huang, Jianhua Z.; Härdle, Wolfgang Karl

    2013-01-01

    Generalized quantile regressions, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized quantile regressions. Our approach assumes that the generalized quantile regressions share some common features that can be summarized by a small number of pri...

  5. Linear regression analysis theory and computing

    CERN Document Server

    Yan, Xin

    2009-01-01

    This volume presents in detail the fundamental theories of linear regression analysis and diagnosis, as well as the relevant statistical computing techniques so that readers are able to actually model the data using the methods and techniques described in the book. It covers the fundamental theories in linear regression analysis and is extremely useful for future research in this area. The examples of regression analysis using the Statistical Application System (SAS) are also included. This book is suitable for graduate students who are either majoring in statistics/biostatistics or using line

  6. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  7. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model. PMID:24337935

  8. Modeling Oil Palm Yield Using Multiple Linear Regression and Robust M-regression

    OpenAIRE

    Azme Khamis; Zuhaimy Ismail; Khalid Haron; Ahmad Tarmizi Mohammed

    2006-01-01

    This study shows how a multiple linear regression model can be used to model palm oil yield. The methods are illustrated by examining the time series data of foliar nutrient compositions as one of the independent variable and fresh fruit bunch as dependent variable. Other independent variables include the nutrient balance ratio and major nutrient composition. This modeling approach is capable of identifying the significant contribution of each independent variable in the improving the modelin...

  9. Multiple-Case Outlier Detection in Multiple Linear Regression Model Using Quantum-Inspired Evolutionary Algorithm

    OpenAIRE

    Salena Akter; Mozammel H. A. Khan

    2010-01-01

    In ordinary statistical methods, multiple outliers in multiple linear regression model are detected sequentially one after another, where smearing and masking effects give misleading results. If the potential multiple outliers can be detected simultaneously, smearing and masking effects can be avoided. Such multiple-case outlier detection is of combinatorial nature and 2^N-N-1 sets of possible outliers need to be tested, where N is the number of data points. This exhaustive search is practica...

  10. General Dimensional Multiple-Output Support Vector Regressions and Their Multiple Kernel Learning.

    Science.gov (United States)

    Chung, Wooyong; Kim, Jisu; Lee, Heejin; Kim, Euntai

    2015-11-01

    Support vector regression has been considered as one of the most important regression or function approximation methodologies in a variety of fields. In this paper, two new general dimensional multiple output support vector regressions (MSVRs) named SOCPL1 and SOCPL2 are proposed. The proposed methods are formulated in the dual space and their relationship with the previous works is clearly investigated. Further, the proposed MSVRs are extended into the multiple kernel learning and their training is implemented by the off-the-shelf convex optimization tools. The proposed MSVRs are applied to benchmark problems and their performances are compared with those of the previous methods in the experimental section. PMID:25532215

  11. Multiple linear regression estimators with skew normal errors

    Science.gov (United States)

    Alhamide, A. A.; Ibrahim, K.; Alodat, M. T.

    2015-09-01

    The idea of skew normal distribution is suitable to be used for the analysis of data which is skewed. The purpose of this paper is to study the estimation of the regression parameters under the extended multivariate skew normal errors. The estimators for the regression parameters found based on the maximum likelihood method are derived. A simulation study is carried out to investigate the performance of the estimators derived and the standard errors associate with the respective parameters estimates are found to be quite small.

  12. Survival analysis and regression models

    OpenAIRE

    George, Brandon; Seals, Samantha; Aban, Inmaculada

    2014-01-01

    Time-to-event outcomes are common in medical research as they offer more information than simply whether or not an event occurred. To handle these outcomes, as well as censored observations where the event was not observed during follow-up, survival analysis methods should be used. Kaplan-Meier estimation can be used to create graphs of the observed survival curves, while the log-rank test can be used to compare curves from different groups. If it is desired to test continuous predictors or t...

  13. Multiple predictor smoothing methods for sensitivity analysis

    International Nuclear Information System (INIS)

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  14. Multiple predictor smoothing methods for sensitivity analysis.

    Energy Technology Data Exchange (ETDEWEB)

    Helton, Jon Craig; Storlie, Curtis B.

    2006-08-01

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

  15. Multiple Linear Regression for Extracting Phrase Translation Pairs

    Directory of Open Access Journals (Sweden)

    Chun-Xiang Zhang

    2011-05-01

    Full Text Available Phrase translation pairs are very useful for bilingual lexicography, machine translation system, cross-lingual information retrieval and many applications in natural language processing. Phrase translation pairs are always extracted from bilingual sentence pairs. In this paper, we extract phrase translation pairs based on word alignment results of Chinese-English bilingual sentence pairs and parsing trees of Chinese sentences, in order to decrease the influence of the grammar disagreement between Chinese and English. Discriminative features for phrase translation pairs are proposed to evaluate extracted ones in this paper, including translation literality, phrase alignment probability and phrase length difference. Multiple linear regression model combined with N-best strategy will be employed to filter phrase translation pairs, in order to improve the evaluating and filtering performance. Experimental results indicate that the filtering performance of phrase alignment probability is best in three kinds of discriminative features for evaluating Chinese-English phrase translation pairs. After multiple linear regression model combined with N-best strategy is used, its F1 achieves 86.24%.

  16. Fiscal Multipliers: A Meta Regression Analysis

    OpenAIRE

    Gechert, Sebastian; Will, Henner

    2012-01-01

    Since the fiscal expansion during the Great Recession 2008-2009 and the current European consolidation and austerity measures, the analysis of fiscal multiplier effects is back on the scientific agenda. The number of empirical studies is growing fast, tackling the issue with manifold model classes, identification strategies, and specifications. While plurality of methods seems to be a good idea to address a complicated issue, the results are far off consensus. We apply meta regression analysi...

  17. Hot Resistance Estimation for Dry Type Transformer Using Multiple Variable Regression, Multiple Polynomial Regression and Soft Computing Techniques

    Directory of Open Access Journals (Sweden)

    M. Srinivasan

    2012-01-01

    Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.

  18. Precipitation interpolation in mountainous regions using multiple linear regression

    Science.gov (United States)

    Hay, L.; Viger, R.; McCabe, G.

    1998-01-01

    Multiple linear regression (MLR) was used to spatially interpolate precipitation for simulating runoff in the Animas River basin of southwestern Colorado. MLR equations were defined for each time step using measured precipitation as dependent variables. Explanatory variables used in each MLR were derived for the dependent variable locations from a digital elevation model (DEM) using a geographic information system. The same explanatory variables were defined for a 5 ?? 5 km grid of the DEM. For each time step, the best MLR equation was chosen and used to interpolate precipitation onto the 5 ?? 5 km grid. The gridded values of precipitation provide a physically-based estimate of the spatial distribution of precipitation and result in reliable simulations of daily runoff in the Animas River basin.

  19. Assessing the binding affinity of a selected class of DPP4 inhibitors using chemical descriptor-based multiple linear regression

    OpenAIRE

    Jose Isagani Janairo; Gerardo Janairo; Frumencio Co; Derrick Ethelbhert Yu

    2011-01-01

    The activity of a selected class of DPP4 inhibitors was preliminarily assessed using chemical descriptors derived AM1 optimized geometries. Using multiple linear regression model, it was found that ?E0, LUMO energy, area, molecular weight and ?H0 are the significant descriptors that can adequately assess the binding affinity of the compounds. The derived multiple linear regression (MLR) model was validated using rigorous statistical analysis. The preliminary model suggests t...

  20. Multiple regression models for energy use in air-conditioned office buildings in different climates

    International Nuclear Information System (INIS)

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  1. Functional linear regression analysis for longitudinal data

    CERN Document Server

    Yao, F; Wang, J L; Yao, Fang; Müller, Hans-Georg; Wang, Jane-Ling

    2005-01-01

    We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a discrete random number and, accordingly, only a finite and asymptotically nonincreasing number of measurements are available for each subject or experimental unit. We propose a functional regression approach for this situation, using functional principal component analysis, where we estimate the functional principal component scores through conditional expectations. This allows the prediction of an unobserved response trajectory from sparse measurements of a predictor trajectory. The resulting technique is flexible and allow...

  2. Dynamic Population Structure based PSO with Granular Computing for Unified Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Chen Su-Fen

    2013-01-01

    Full Text Available Unified Multiple Linear Regression (UMLR is a nonlinear programming model that unifies all kind of multiple linear regression models, such as Principal Components Regression, Ridge Regression, Robust Regression and constrained regression. Although, UMLR has exhibited excellent performances in some real applications, the optimization procedure is not satisfying yet. This study proposes a novel Granular Computing-Particle Swarm Optimization (Grc-PSO algorithm by introducing granular computing into standard PSO which is used for the optimization of the UMLR model. The experimental results show that the solution got by Grc-PSO algorithm is much better to the real situation than other state-of-art algorithms.

  3. Multiple Regression Redshift Calibration for Clusters of Galaxies

    Science.gov (United States)

    Kalinkov, M.; Kuneva, I.; Valtchanov, I.

    A new procedure for calibration of distances to ACO (Abell et al.1989) clusters of galaxies has been developed. In the previous version of the Reference Catalog of ACO Clusters of Galaxies (Kalinkov & Kuneva 1992) an attempt has been made to compare various calibration schemes. For the Version 93 we have made some refinements. Many improvements from the early days of the photometric calibration have been made --- from Rowan-Robinson (1972), Corwin (1974), Kalinkov & Kuneva (1975), Mills Hoskins (1977) to more complicated --- Leir & van den Bergh (1977), Postman et al.(1985), Kalinkov Kuneva (1985, 1986, 1990), Scaramella et al.(1991), Zucca et al. (1993). It was shown that it is impossible to use the same calibration relation for northern (A) and southern (ACO) clusters of galaxies. Therefore the calibration have to be made separately for both catalogs. Moreover it is better if one could find relations for the 274 A-clusters, studied by the authors of ACO. We use the luminosity distance for H0=100km/s/Mpc and q0 = 0.5 and we have 1200 clusters with measured redshifts. The first step is to fit log(z) on m10 (magnitude of the tenth rank galaxy) for A-clusters and on m1, m3 and m10 for ACO clusters. The second step is to take into account the K-correction and the Scott effect (Postman et al.1985) with iterative process. To avoid the initial errors of the redshift estimates in A- and ACO catalogs we adopt Hubble's law for the apparent radial distribution of galaxies in clusters. This enable us to calculate a new cluster richness from preliminary redshift estimate. This is the third step. Further continues the study of the correlation matrix between log(z) and prospective predictors --- new richness groups, BM, RS and A types, radio and X-ray fluxes, apparent separations between the first three brightest galaxies, mean population (gal/sq.deg), Multiple linear as well as nonlinear regression estimators are found. Many clusters that deviate by more than 2.5 sigmas are rejected. Each case is examined for observational errors, substructuring, foreground and background. Some of the clusters are doubtful --- most probably they have to be excluded from the catalogs. The multiple regressions allow us to estimate redshift in the range 0.02 to 0.2 with an error of 7 percent.

  4. Regression Analysis for the Social Sciences

    CERN Document Server

    Gordon, Rachel A A

    2012-01-01

    The book provides graduate students in the social sciences with the basic skills that they need to estimate, interpret, present, and publish basic regression models using contemporary standards. Key features of the book include:interweaving the teaching of statistical concepts with examples developed for the course from publicly-available social science data or drawn from the literature. thorough integration of teaching statistical theory with teaching data processing and analysis.teaching of both SAS and Stata "side-by-side" and use of chapter exercises in which students practice programming

  5. Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping

    Science.gov (United States)

    2013-01-01

    Background Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs. Results In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request. Conclusions The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits. PMID:23410082

  6. Local bilinear multiple-output quantile/depth regression.

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav

    2015-01-01

    Ro?. 21, ?. 3 (2015), s. 1435-1466. ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.161, year: 2014 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf

  7. Sliced Inverse Regression for big data analysis

    OpenAIRE

    Kevin, Li

    2014-01-01

    Modem advances in computing power have greatly widened scientists' scope in gathering and investigating information from many variables. We describe sliced inverse regression (SIR), for reducing the dimension of the input variable x without going through any parametric or nonparametric model-fitting process. This method explores the simplicity of the inverse view of regression. Instead of regressing the univariate output variable y against the multivariate x, we regress x against y. Forward r...

  8. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Science.gov (United States)

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  9. Throughput Prediction of Fishing Goods Based on the Grey Multiple Linear Regression Method

    OpenAIRE

    Changping Chen; Changlu Zhou; Xueda Zhao; Yanna Zheng; Xianying Shi

    2014-01-01

    Based on the grey prediction method and multiple linear regression method, the grey multiple linear regression method was presented. This method was applied to the throughput prediction of fishing goods according to five fishing ports’ actual throughput data. The result of comparing the calculating conclusion to the time series one-dimensional linear regression method and grey prediction method proved that the method of calculation and analyzing was more effective and the forecasting precisio...

  10. Forecasting Gold Prices Using Multiple Linear Regression Method

    Directory of Open Access Journals (Sweden)

    Z. Ismail

    2009-01-01

    Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as ?forecast-1? was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on ?a hunch of experts?, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to have influence on the prices. Parameter estimations for the MLR were carried out using Statistical Packages for Social Science package (SPSS with Mean Square Error (MSE as the fitness function to determine the forecast accuracy. Conclusion: Two models were considered. The first model considered all possible independent variables. The model appeared to be useful for predicting the price of gold with 85.2% of sample variations in monthly gold prices explained by the model. The second model considered the following four independent variables the (CRB lagged one, (EUROUSD lagged one, (INF lagged two and (M1 lagged two to be significant. In terms of prediction, the second model achieved high level of predictive accuracy. The amount of variance explained was about 70% and the regression coefficients also provide a means of assessing the relative importance of individual variables in the overall prediction of gold price.

  11. Forecasting Electrical Load using ANN Combined with Multiple Regression Method

    OpenAIRE

    Saeed M. Badran; Ossama B. Abouelatta

    2012-01-01

    This paper combined artificial neural network and regression modeling methods to predict electrical load. We propose an approach for specific day, week and/or month load forecasting for electrical companies taking into account the historical load. Therefore, a modified technique, based on artificial neural network (ANN) combined with linear regression, is applied on the KSA electrical network dependent on its historical data to predict the electrical load demand forecasting up to year 2020. T...

  12. Analysis of Multiple Phenotypes

    OpenAIRE

    Kent, Jack W.

    2009-01-01

    The complex etiology of common diseases like cardiovascular disease, diabetes, hypertension, and rheumatoid arthritis has led investigators to focus on the genetics of correlated phenotypes and risk factors. Joint analysis of multiple disease-related phenotypes may reveal genes of pleiotropic effect and increase analytical power, but at the cost of increased analytical and computational complexity. All three data sets provided for analysis at the Genetic Analysis Workshop 16 offered multiple ...

  13. Spatial regression analysis on 32 years total column ozone data

    Directory of Open Access Journals (Sweden)

    J. S. Knibbe

    2014-02-01

    Full Text Available Multiple-regressions analysis have been performed on 32 years of total ozone column data that was spatially gridded with a 1° × 1.5° resolution. The total ozone data consists of the MSR (Multi Sensor Reanalysis; 1979–2008 and two years of assimilated SCIAMACHY ozone data (2009–2010. The two-dimensionality in this data-set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on non-seasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO, El Nino (ENSO and stratospheric alternative halogens (EESC. For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at high and mid-latitudes, the solar cycle affects ozone positively mostly at the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high Northern latitudes, the effect of QBO is positive and negative at the tropics and mid to high-latitudes respectively and ENSO affects ozone negatively between 30° N and 30° S, particularly at the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally large at mid to high latitudes. We observe ozone contributing effects for potential vorticity and day length, negative effect on ozone for geopotential height and variable ozone effects due to the polar vortex at regions to the north and south of the polar vortices. Recovery of ozone is identified globally. However, recovery rates and uncertainties strongly depend on choices that can be made in defining the explanatory variables. In particular the recovery rates over Antarctica might not be statistically significant. Furthermore, the results show that there is no spatial homogeneous pattern which regression model and explanatory variables provide the best fit to the data and the most accurate estimates of the recovery rates. Overall these results suggest that care has to be taken in determining ozone recovery rates, in particular for the Antarctic ozone hole.

  14. Single and multiple index functional regression models with nonparametric link

    OpenAIRE

    Chen, Dong; Hall, Peter; Müller, Hans-Georg

    2012-01-01

    Fully nonparametric methods for regression from functional data have poor accuracy from a statistical viewpoint, reflecting the fact that their convergence rates are slower than nonparametric rates for the estimation of high-dimensional functions. This difficulty has led to an emphasis on the so-called functional linear model, which is much more flexible than common linear models in finite dimension, but nevertheless imposes structural constraints on the relationship between...

  15. Vehicle Travel Time Predication based on Multiple Kernel Regression

    OpenAIRE

    Wenjing Xu

    2014-01-01

    With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR) method etc. However, these algo...

  16. A graphical analysis of cost-sensitive regression problems

    OpenAIRE

    Hernandez-Orallo, Jose

    2012-01-01

    Several efforts have been done to bring ROC analysis beyond (binary) classification, especially in regression. However, the mapping and possibilities of these proposals do not correspond to what we expect from the analysis of operating conditions, dominance, hybrid methods, etc. In this paper we present a new representation of regression models in the so-called regression ROC (RROC) space. The basic idea is to represent over-estimation on the x axis and under-estimation on t...

  17. Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth

    OpenAIRE

    Hallin, Marc; Paindaveine, Davy; Siman, Miroslav

    2008-01-01

    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to co...

  18. Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models

    OpenAIRE

    Lenk, Jan; M, Claus

    2011-01-01

    In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi

  19. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  20. MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO

    Science.gov (United States)

    A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...

  1. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  2. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA

    Science.gov (United States)

    Ohlmacher, G.C.; Davis, J.C.

    2003-01-01

    Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.

  3. Egg hatchability prediction by multiple linear regression and artificial neural networks

    OpenAIRE

    AC Bolzan; RAF Machado; JCZ Piaia

    2008-01-01

    An artificial neural network (ANN) was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determi...

  4. Applying Multiple Linear Regression and Neural Network to Predict Bank Performance

    OpenAIRE

    Nor Mazlina Abu Bakar; Izah Mohd Tahir

    2009-01-01

    Globalization and technological advancement has created a highly competitive market in the banking and finance industry. Performance of the industry depends heavily on the accuracy of the decisions made at managerial level. This study uses multiple linear regression technique and feed forward artificial neural network in predicting bank performance. The study aims to predict bank performance using multiple linear regression and neural network. The study then evaluates the performance of the t...

  5. Comparison of Fuzzy Inference System and Multiple Regression to Predict Synthetic Envelopes Clogging

    OpenAIRE

    Bakhtiar Karimi; Farhad Mirzaei; Mohammad Javad Nahvinia; Behnam Ababaei

    2010-01-01

    Geo-synthetic materials are being used with acceptable performance in soil and water projects worldwide. Geotextiles are one of the categories of geo-synthetics being used in drainage systems. First generation of geotextiles used in the late 1950’s as an alternative for gravel envelopes. In this research two methods (multiple regression and fuzzy interference system) evaluate to predict synthetic envelope clogging. In multiple regression method the correlation coefficients for PP450, PP700 an...

  6. Neutron multiplicity analysis tool

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, Scott L [Los Alamos National Laboratory

    2010-01-01

    I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from ({alpha},n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ({le} 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This program was developed to help speed the analysis of Monte Carlo neutron transport simulation (MCNP) data, and only requires the count-rate data to calculate the mass of material using INCC's analysis methods instead of the full neutron multiplicity distribution required to run analysis in INCC. This paper describes what is implemented within EXCOM, including the methods used, how the program corrects for deadtime, and how uncertainty is calculated. This paper also describes how to use EXCOM within Excel.

  7. Multiple predictor smoothing methods for sensitivity analysis: Description of techniques

    International Nuclear Information System (INIS)

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  8. The Precision Efficacy Analysis for Regression Sample Size Method.

    Science.gov (United States)

    Brooks, Gordon P.; Barcikowski, Robert S.

    The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…

  9. The Determination of Polyethlylene Glycol and Water in Archaeological Wood using Infrared Spectroscopy and Stepwise Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Rohan PATEL

    2012-03-01

    Full Text Available Polyethylene glycol (PEG is the most common preservative in use for bulking and maintaining structural integrity in waterlogged wood. Conservators therefore have a need to be able to determine PEG concentrations in wood in a non-destructive manner. We present a study highlighting the application of infrared spectroscopy coupled with multivariate analysis techniques to predict the concentration of polyethylene glycol 400 (PEG-400 and water simultaneously. This technique uses attenuated total reflectance (ATR spectroscopy andunconstrained stepwise multiple linear regression (SMLR analysis for prediction of multiple components in archaeological wood. Using this model we have calculated the concentration of PEG-400 and water in treated archaeological waterlogged wood samples.

  10. A regression analysis of NHL cap hits

    OpenAIRE

    Flogvall, Carl; Nordenskjöld, Stefan

    2014-01-01

    This report is a study if a multi linear regression could be used to predict the cap hit of hockey forwards from the NHL. Data was collected during the 2010-2011, 2011-2012, and 2012-2013 seasons. The chosen variables were common hockey statistics and a few none hockey-related, like origin and age. The initial model was improved by removing insignicant covariates, detected by BIC-test and p-values. The final model consisted of 291 players and had an adjusted R2-value of 0,7820. Of the covaria...

  11. Variable Importance in Multiple Regression and Canonical Correlation.

    Science.gov (United States)

    Thompson, Bruce

    This paper explains in user-friendly terms why multivariate statistics are so important in educational research. The basic logic of canonical correlation analysis is presented as a simple or bivariate Pearson "r" procedure. It is noted that all statistical tests implicitly involve the calculation of least squares weights, and that all parametric…

  12. PRINCIPAL COMPONENTS ANALYSIS AND PARTIAL LEAST SQUARES REGRESSION

    Science.gov (United States)

    The mathematics behind the techniques of principal component analysis and partial least squares regression is presented in detail, starting from the appropriate extreme conditions. he meaning of the resultant vectors and many of their mathematical interrelationships are also pres...

  13. Combined linkage and segregation analysis using regressive models.

    OpenAIRE

    Bonney, G E; Lathrop, G.M.; Lalouel, J M

    1988-01-01

    Regressive models for segregation analysis have been extended to include multivariate data and linked marker loci. The new models have been applied to data from two pedigrees segregating a gene for cardiovascular disease.

  14. Improved Estimation in Multiple Linear Regression Models with Measurement Error and General Constraint

    OpenAIRE

    LIANG, Hua; Song, Weixing

    2009-01-01

    In this paper, we define two restricted estimators for the regression parameters in a multiple linear regression model with measurement errors when prior information for the parameters is available. We then construct two sets of improved estimators which include the preliminary test estimator, the Stein-type estimator and the positive rule Stein type estimator for both slope and intercept, and examine their statistical properties such as the asymptotic distributional quadratic biases, the asy...

  15. 3D Regression Heat Map Analysis of Population Study Data.

    Science.gov (United States)

    Klemm, Paul; Lawonn, Kai; Glaser, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Volzke, Henry; Preim, Bernhard

    2016-01-01

    Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease. PMID:26529689

  16. The Study on Technology Innovation of Chinese Enterprises by Regression Analysis

    OpenAIRE

    ZIYAN ZHANG; xungang zheng

    2011-01-01

    According to China Science and Technology Data in recent years, we use Multiple Regression to analysis the influencing factors of technology innovation, and demonstrate the impact of significant and non-significant factors about China’s investment expenditures related policies for technological innovation, so as to enhance China's technological innovation capability and to promote domestic economic development play a guidance and reference.

  17. Egg hatchability prediction by multiple linear regression and artificial neural networks

    Scientific Electronic Library Online (English)

    AC, Bolzan; RAF, Machado; JCZ, Piaia.

    2008-06-01

    Full Text Available An artificial neural network (ANN) was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained [...] from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determined by minimum square method. The proposed simulation results of these approaches indicate that this ANN can be used for incubation performance prediction.

  18. An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms

    Science.gov (United States)

    Sykas, Dimitris; Karathanassi, Vassilia

    2015-06-01

    This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.

  19. Linear regression analysis of survival data with missing censoring indicators.

    Science.gov (United States)

    Wang, Qihua; Dinse, Gregg E

    2011-04-01

    Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722

  20. Kinetic analysis of tumor regression during the course of radiotherapy

    International Nuclear Information System (INIS)

    A model of tumor regression during the course of radiotherapy is presented, which considered the lethal effect of radiation, the elimination of dying cells and the reproduction of surviving cells. The model was applied to multiple lung metastases in a women, which were treated with different doses per fraction. The regression of tumor volume was measured from chest radiographs which were taken once a week during the treatment. Radiosensitivity of tumor cells estimated by the model was of D0 = 150 R and n = 5. The half life of dyning cells in the tumor was 15 days. (orig.)

  1. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is oft...

  2. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  3. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  4. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    International Nuclear Information System (INIS)

    A multiple linear regression method was used to compute ? spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  5. Tumor regression of multiple bone metastases from breast cancer after administration of strontium-89 chloride (Metastron)

    OpenAIRE

    Heianna, Joichi; Miyauchi, Takaharu; Endo, Wataru; MIURA, Naoki; Terui, Kazuyuki; Kamata, Syuichi; Hashimoto, Manabu

    2014-01-01

    We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases.

  6. Tumor regression of multiple bone metastases from breast cancer after administration of strontium-89 chloride (Metastron)

    International Nuclear Information System (INIS)

    We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases

  7. Regression Model Optimization for the Analysis of Experimental Data

    Science.gov (United States)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  8. Evaluation Applications of Regression Analysis with Time-Series Data.

    Science.gov (United States)

    Veney, James E.

    1993-01-01

    The application of time series analysis is described, focusing on the use of regression analysis for analyzing time series in a way that may make it more readily available to an evaluation practice audience. Practical guidelines are suggested for decision makers in government, health, and social welfare agencies. (SLD)

  9. The Sage handbook of regression analysis and causal inference

    CERN Document Server

    Best, Henning

    2014-01-01

    Covering both general and advanced aspects of multivariate methods, this handbook focuses on regression analysis of cross-sectional and longitudinal data with an emphasis on causal analysis and provides readers with an introduction to and exploration of a large range of techniques.

  10. High-Dose Vitamin C Promotes Regression of Multiple Pulmonary Metastases Originating from Hepatocellular Carcinoma.

    Science.gov (United States)

    Seo, Min-Seok; Kim, Ja-Kyung; Shim, Jae-Yong

    2015-09-01

    We report a case of regression of multiple pulmonary metastases, which originated from hepatocellular carcinoma after treatment with intravenous administration of high-dose vitamin C. A 74-year-old woman presented to the clinic for her cancer-related symptoms such as general weakness and anorexia. After undergoing initial transarterial chemoembolization (TACE), local recurrence with multiple pulmonary metastases was found. She refused further conventional therapy, including sorafenib tosylate (Nexavar). She did receive high doses of vitamin C (70 g), which were administered into a peripheral vein twice a week for 10 months, and multiple pulmonary metastases were observed to have completely regressed. She then underwent subsequent TACE, resulting in remission of her primary hepatocellular carcinoma. PMID:26256994

  11. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. PMID:25732839

  12. Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil

    Directory of Open Access Journals (Sweden)

    Newton Carneiro Affonso da Costa Jr.

    2004-06-01

    Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.

  13. Sintering equation: determination of its coefficients by experiments - using multiple regression

    International Nuclear Information System (INIS)

    Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)

  14. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Science.gov (United States)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  15. Analisis Efisiensi, Skala dan Elastisitas Produksi dengan Pendekatan Cobb-Douglas dan Multiple Regression

    Directory of Open Access Journals (Sweden)

    Yuliastuti Ramadhani

    2011-06-01

    Full Text Available Generally, productivity is interpreted as relation between input and output, that is the comparison between input and the result or output. The measurement of productivity is one of the major indicator in assessing compete ability in a company. PT Taman Batu Alam is a natural stone company, that in its growth always cope to increases the productivity by doing repairmen in production.The measurement and performance analyze of transform process are done by using multiple regression analysis. This model selection is based on the form that simple and easy to comprehended. Directly it can depict the size measurement of performance that is the index of efficiency and production function in which can show elasticity of input usage that be used to produces the output.From the calculation result, its gotten that proportion input in which having effects to production process is efficiency index for the year of 2007 is 5.57 and for the year of 2008 is 1094,44. Result of return to scale in 2007 increasing and in 2008 decreasing. The usage of input elasticity: for the year of 2007 the usage of raw material is 0.39, the usage of labour is 0.22 and the expense of overhead is 0,42. While for the year of 2008 the usage of raw material is 0.39, the usage of labour is 0.165 and the expense of overhead is 0,237.

  16. Multiple regression as a preventive tool for determining the risk of Legionella spp.

    Directory of Open Access Journals (Sweden)

    Enrique Gea-Izquierdo

    2012-04-01

    Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.

  17. QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions

    Indian Academy of Sciences (India)

    Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali

    2015-07-01

    The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.

  18. Spatial data, analysis, and regression - a mini course

    Directory of Open Access Journals (Sweden)

    Daniel Arribas-Bel

    2014-10-01

    Full Text Available This resource contains the materials and structure suggested to run a mini course of approximately 14 hours on spatial data, analysis and regression. The course is structured along four lectures and four labs that require the use of computers.

  19. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    International Nuclear Information System (INIS)

    Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 °C, without detailed knowledge or need for simulation of the process. - Highlights: • The maximum thermal efficiency of ORCs in hundreds of cases was analysed. • Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. • Using only key design parameters, the maximum obtainable efficiency can be evaluated. • The regression models decrease the resources needed to evaluate the maximum potential. • The models are statistically strong and in good agreement with the literature

  20. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Science.gov (United States)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  1. Estimation of Parameters in Heteroscedastic Multiple Regression Model using Leverage Based Near-Neighbors

    Directory of Open Access Journals (Sweden)

    H. Midi

    2009-01-01

    Full Text Available In this study, we propose a Leverage Based Near-Neighbor (LBNN method where prior information on the structure of the heteroscedastic error is not required. In the proposed LBNN method, weights are determined not from the near-neighbor values of the explanatory variables, but from their corresponding leverage values so that it can be readily applied to a multiple regression model. Both the empirical and Monte Carlo simulation results show that the LBNN method offers substantial improvement over the existing methods. The LBNN has significantly reduced the standard errors of the estimates and also the standard errors of residuals for both simple and multiple linear regression models. Hence, the LBNN can be established as one reliable alternative approach to other existing methods that deal with heteroscedastic errors when the form of heteroscedasticity is unknown.

  2. Mass estimation of loose parts in nuclear power plant based on multiple regression

    International Nuclear Information System (INIS)

    According to the application of the Hilbert–Huang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal Hilbert–Huang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)

  3. Neural network and multiple linear regression to predict school children dimensions for ergonomic school furniture design.

    Science.gov (United States)

    Agha, Salah R; Alnahhal, Mohammed J

    2012-11-01

    The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study. PMID:22365329

  4. The use of weighted multiple linear regression to estimate QTL-by-QTL epistatic effects

    OpenAIRE

    Jan Bocianowski

    2012-01-01

    Knowledge of the nature and magnitude of gene effects, as well as their contribution to the control of metric traits, is important in formulating efficient breeding programs for the improvement of plant genetics. Information concerning a genetic parameter such as the additive-by-additive epistatic effect can be useful in traditional breeding. This report describes the results obtained by applying weighted multiple linear regression to estimate the parameter connected with an additive-by-addit...

  5. An Algorithm to Estimate Continuous-time Traffic Speed Using Multiple Regression Model

    OpenAIRE

    Xin Jin; Suk-Kyo Hong; Qiang Ma

    2006-01-01

    In this study we present a novel algorithm to estimate continuous-time traffic speed data using multiple regression based on the correlated speed and then compare its results to other baseline missing speed prediction methods with real freeway traffic speed data. Since this approach has greater generalization ability for given real speed data, it is believed that this model will also perform well for all time-series missing data estimation fields.

  6. Estimation of Parameters in Heteroscedastic Multiple Regression Model using Leverage Based Near-Neighbors

    OpenAIRE

    H. Midi; Rana, S; A. H. M. R. Imon

    2009-01-01

    In this study, we propose a Leverage Based Near-Neighbor (LBNN) method where prior information on the structure of the heteroscedastic error is not required. In the proposed LBNN method, weights are determined not from the near-neighbor values of the explanatory variables, but from their corresponding leverage values so that it can be readily applied to a multiple regression model. Both the empirical and Monte Carlo simulation results show that the LBNN method offers substantial improvement o...

  7. Multiple linear regression MOS for short-term wind power forecast

    OpenAIRE

    Ranaboldo, Matteo

    2011-01-01

    Short-term (0 - 36 h ahead) wind power forecast is a central issue for the correct management of a grid connected wind farm. A combination of physical and statistical treatments to post-process Numerical Weather Predictions (NWP) outputs is needed for successful short-term wind power forecasts. One of the most promising and effective approaches for statistical treatment is the Model Output Statistics (MOS) technique. In this study a MOS based on multiple linear regression is proposed: the mod...

  8. Statistical studies of mortality and air pollution multiple regression analyses by cause of death

    Energy Technology Data Exchange (ETDEWEB)

    Lipfert, F.W.

    1980-10-01

    Multiple regression analyses relating community air quality, socioeconomic variables, and mortality rates for all cancers, respiratory system cancer, respiratory disease, and external causes, for U.S. cities during 1969-71 are presented. Socioeconomic variables included an index of cigarette smoking, which was highly significant. Most air pollution variables were not significant, with the exception of the trace metal manganese, which was associated with cancers and respiratory disease. (33 references, 8 tables)

  9. Including climate into the assessment of future fish recruitment, using multiple regression models.

    OpenAIRE

    Stiansen, Jan Erik; Aglen, Asgeir; Bogstad, Bjarte; Loeng, Harald; Mehl, Sigbjørn; Nakken, Odd; Ottersen, Geir; Svendsen, Einar

    2005-01-01

    Climate variability has generally not been included in the assessment of fish stocks in the Barents Sea and Norwegian Sea. However, in recent years there has been a focus on implementing climate variability in the assessment for several stocks in both areas. A promising approach, using linear multiple regression models, has been applied for short time projections of recruitment of Northeast Arctic cod, Norwegian spring spawning herring and Barents Sea capelin. Environmental factors influence ...

  10. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain estimates of relevant parameters in an over-determined system of equations which may arise from deliberately carrying out more measurements than actually needed to determine the set of desired parameters. An example may be the determination of a geographical position based on information from a number of Global Navigation Satellite System (GNSS) satellites also known as space vehicles (SV). It takes at least four SVs to determine the position (and the clock error) of a GNSS receiver. Often more than four SVs are used and we use adjustment to obtain a better estimate of the geographical position (and the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between different variables in an experiment or in a survey, etc. Regression analysis is probably one the most used statistical techniques around. Dr. Anna B. O. Jensen provided insight and data for the Global Positioning System (GPS) example. Matlab code and sections that are considered as either traditional land surveying material or as advanced material are typeset with smaller fonts. Comments in general or on for example unavoidable typos, shortcomings and errors are most welcome.

  11. [Correlation-regression analysis of the impact of motor transport on human health].

    Science.gov (United States)

    Shavrak, E I; Shapkina, T S; Shavrak, D S

    2009-01-01

    Statistical analysis of the time series and spatial data of sociohygienic monitoring has yielded models of simple and multiple linear regressions, which reflect the impact of motor transport on human health, their statistical stability. The morbidity rates have ranked by the response to the changing values of motor transport. A contribution of environmental factors to the morbidity rates has been determined. Areas of application of built models are proposed. PMID:19354177

  12. Solar Flare Prediction Using Kernel-based Regression Analysis

    Science.gov (United States)

    Fu, Gang; Jing, J.; Song, H.; Shih, F. Y.; Wang, H.

    2007-05-01

    Automated forecasting the onset of solar flares from the analysis of photospheric magnetogram data remains an essential and challenging task. In the present study, we present a novel kernel-based regression method for predicting the probability distribution of flare index of an active region. The target variable, soft X-ray flare index, quantifies the flare productivity of an active region within the chosen time window. The predictor vector of an active region includes several magnetic parameters which are derivable from the MDI line-of-sight magnetograms (e.g., total unsigned magnetic flux, the length of the magnetic neutral lines with strong magnetic gradient, etc). By applying kernel functions, the predictor vectors are implicitly mapped into high dimensional feature space which is more informative than the original input space. Then the regression analysis is conducted in this feature space. Compared to the conventional statistical regression analysis, kernel-based methods have shown great advantages. Details of the method and data analysis procedure are first described in the paper. We then applied the method to a large sample dataset of active regions (NOAA 7961 - 10933). The experimental results are presented, showing that our method is of practical significance in automated flare forecasting.

  13. Early cost estimating for road construction projects using multiple regression techniques

    Directory of Open Access Journals (Sweden)

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  14. Multivariate Regression Analysis of Gravitational Waves from Rotating Core Collapse

    OpenAIRE

    Engels, William J.; Frey, Raymond; Ott, Christian D.

    2014-01-01

    We present a new multivariate regression model for analysis and parameter estimation of gravitational waves observed from well but not perfectly modeled sources such as core-collapse supernovae. Our approach is based on a principal component decomposition of simulated waveform catalogs. Instead of reconstructing waveforms by direct linear combination of physically meaningless principal components, we solve via least squares for the relationship that encodes the connection be...

  15. Cefoperazone: regression analysis, disk content, and disk susceptibility testing considerations.

    OpenAIRE

    Wright, D. N.; Welch, D. F.; Saxon, B A; Clark, S.J.; Matsen, J. M.

    1982-01-01

    Cefoperazone agar dilution minimal inhibitory concentration (MIC) susceptibility results were compared with zones of inhibition produced by disk diffusion susceptibility testing. Disks containing 30, 50, 75, and 100 micrograms of cefoperazone were tested for purposes of regression line comparisons and error rate-bounded analysis. Results suggest that if the MIC equivalent of susceptibility is 32 micrograms/ml, either a 50-micrograms disk with zone sizes of susceptibility (S) greater than or e...

  16. Image metadata estimation using independent component analysis and regression

    OpenAIRE

    Blighe, Michael; Le Borgne, Hervé; O'Connor, Noel E

    2006-01-01

    In this paper, we describe an approach to camera metadata estimation using regression based on Independent Component Analysis (ICA). Semantic scene classification of images using camera metadata related to capture conditions has had some success in the past. However, different makes and models of camera capture different types of metadata and this severely hampers the application of this kind of approach in real systems that consist of photos captured by many different users. We propose to ad...

  17. Regression analysis of a chemical reaction fouling model

    International Nuclear Information System (INIS)

    A previously reported mathematical model for the initial chemical reaction fouling of a heated tube is critically examined in the light of the experimental data for which it was developed. A regression analysis of the model with respect to that data shows that the reference point upon which the two adjustable parameters of the model were originally based was well chosen, albeit fortuitously. (author). 3 refs., 2 tabs., 2 figs

  18. The use of weighted multiple linear regression to estimate QTL-by-QTL epistatic effects

    Scientific Electronic Library Online (English)

    Jan, Bocianowski.

    Full Text Available Knowledge of the nature and magnitude of gene effects, as well as their contribution to the control of metric traits, is important in formulating efficient breeding programs for the improvement of plant genetics. Information concerning a genetic parameter such as the additive-by-additive epistatic e [...] ffect can be useful in traditional breeding. This report describes the results obtained by applying weighted multiple linear regression to estimate the parameter connected with an additive-by-additive epistatic interaction. Three weight variants were used: (1) standard weights based on estimated variances, (2) different weights for minimal, maximal and other lines, and (3) different weights for extreme and other lines. The approach described here combines two methods of estimation, one based on phenotypic observations and the other using molecular marker data. The comparison was done using Monte Carlo simulations. The results show that the application of weighted regression to the marker data yielded estimates similar to those obtained by phenotypic methods.

  19. Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth

    CERN Document Server

    Hallin, Marc; Šiman, Miroslav; 10.1214/09-AOS723

    2010-01-01

    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.

  20. Poisson Regression Analysis of Illness and Injury Surveillance Data

    Energy Technology Data Exchange (ETDEWEB)

    Frome E.L., Watkins J.P., Ellis E.D.

    2012-12-12

    The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra-Poisson variation. The R open source software environment for statistical computing and graphics is used for analysis. Additional details about R and the data that were used in this report are provided in an Appendix. Information on how to obtain R and utility functions that can be used to duplicate results in this report are provided.

  1. Minimax and minimax adaptive estimation in multiplicative regression : locally bayesian approach

    CERN Document Server

    Chichignoud, M

    2010-01-01

    The paper deals with the non-parametric estimation in the regression with the multiplicative noise. Using the local polynomial fitting and the bayesian approach, we construct the minimax on isotropic H\\"older class estimator. Next applying Lepski's method, we propose the estimator which is optimally adaptive over the collection of isotropic H\\"older classes. To prove the optimality of the proposed procedure we establish, in particular, the exponential inequality for the deviation of locally bayesian estimator from the parameter to be estimated. These theoretical results are illustrated by simulation study.

  2. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  3. Quantile regression analysis of Arctic sea ice extent

    Science.gov (United States)

    Silva, M. E.; Barbosa, S. M.; Antunes, Luís; Rocha, Conceição

    2009-04-01

    Surface and satellite-based observations show a decrease in the Arctic sea ice extent during the past 46 years with a minimum in 2007. Climate models are in near universal agreement that Arctic sea ice extent will decline through the 21st century as a consequence of global warming and many studies predict a seasonal ice free Arctic as soon as 2012. Much of the analysis of the ice extent time series, as in most climate studies from observational data, have been focussed on the computation of deterministic linear trends by ordinary least squares which characterizes the rate of change of the conditional mean. However, in climate data and climate change studies a broader description of the data is desirable, namely concerning changes in the spread or shape of the distribution over time. Quantile regression extends the classical linear regression framework of estimation of conditional mean models to the estimation of conditional quantile models. Here, quantile regression is applied to analyse the time series of Arctic sea ice extent from January 1979 to December 2007, available at the National Snow and Ice Data Center.

  4. Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression

    International Nuclear Information System (INIS)

    Highlights: ? We obtained models for estimation of cetane number of biodiesel. ? Twenty-four neural networks using two topologies were evaluated. ? The best neural network for predict the cetane number was selected. ? The best accuracy was obtained for the selected neural network. - Abstract: Models for estimation of cetane number of biodiesel from their fatty acid methyl ester composition using multiple linear regression and artificial neural networks were obtained in this work. For the obtaining of models to predict the cetane number, an experimental data from literature reports that covers 48 and 15 biodiesels in the modeling-training step and validation step respectively were taken. Twenty-four neural networks using two topologies and different algorithms for the second training step were evaluated. The model obtained using multiple regression was compared with two other models from literature and it was able to predict cetane number with 89% of accuracy, observing one outlier. A model to predict cetane number using artificial neural network was obtained with better accuracy than 92% except one outlier. The best neural network to predict the cetane number was a backpropagation network (11:5:1) using the Levenberg–Marquardt algorithm for the second step of the networks training and showing R = 0.9544 for the validation data.

  5. Multivariate study and regression analysis of gluten-free granola

    Scientific Electronic Library Online (English)

    Lilian Maria, Pagamunici; Aloisio Henrique Pereira de, Souza; Aline Kirie, Gohara; Alline Aparecida Freitas, Silvestre; Jesuí Vergílio, Visentainer; Nilson Evelázio de, Souza; Sandra Terezinha Marques, Gomes; Makoto, Matsushita.

    2014-03-01

    Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were [...] evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.

  6. Multivariate study and regression analysis of gluten-free granola

    Directory of Open Access Journals (Sweden)

    Lilian Maria Pagamunici

    2014-03-01

    Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.

  7. Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

    International Nuclear Information System (INIS)

    Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m2, linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)

  8. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials

    Science.gov (United States)

    Hu, L.; Zhang, Z.G.; Mouraux, A.; Iannetti, G.D.

    2015-01-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical oscillations, obtaining single-trial estimate of response latency, frequency, and magnitude. This permits within-subject statistical comparisons, correlation with pre-stimulus features, and integration of simultaneously-recorded EEG and fMRI. PMID:25665966

  9. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  10. MISSING DATA IN REGRESSION MODELS FOR NON-COMMENSURATE MULTIPLE OUTCOMES

    OpenAIRE

    Teixeira-Pinto, Armando; Normand, Sharon-Lise

    2011-01-01

    Biomedical research often involves the measurement of multiple outcomes in different scales (continuous, binary and ordinal). A common approach for the analysis of such data is to ignore the potential correlation among the outcomes and model each outcome separately. This can lead not only to loss of efficiency but also to biased estimates in the presence of missing data. We address the problem of missing data in the context of multiple non-commensurate outcomes. The consequences of missing da...

  11. Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

    International Nuclear Information System (INIS)

    Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe with slight decrease in size. The patient was discharged again without any specific prescription after confirming negative results of various clinical studies including upper GI series and colon study. At the time of finishing this paper the patient is doing well without apparent medical problems

  12. Comparative Analysis Of Least Square Regression And Fixed Effect Panel Data Regression Using Road Traffic Accident In Nigeria

    Directory of Open Access Journals (Sweden)

    J.A. Kupolusi

    2015-01-01

    Full Text Available ABSTRACT In this research work attempt was made to critically analyze the effect of Federal Road Safety Corps FRSC to various categories of road traffic accident in Nigeria for a certain period of time over all the states of federation including Federal capital territory. This was done by using panel data regression model. The conventional OLS estimator applied to panel data has over time led to inconsistent estimate of the regression parameters due to lack of adequately handling individual specific effect of the parameters. A better and preferable estimation method was exploited in this analysis to obtain a more reliable result that can be used for prediction of likely future occurrence. Among all the estimation methods considered only the fixed effect panel data regression method with heteroscedasticity variance-covariance tools gives a consistent estimate of the regression parameters.

  13. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    Science.gov (United States)

    dos Santos, T. S.; Mendes, D.; Torres, R. R.

    2015-08-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

  14. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    DEFF Research Database (Denmark)

    Larsen, Ulrik; Pierobon, Leonardo

    2014-01-01

    Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 C, without detailed knowledge or need for simulation of the process. © 2013 Elsevier Ltd. All rights reserved

  15. Random regressions models to describe the genetic variation of milk yield over multiple parities in Buffaloes

    Directory of Open Access Journals (Sweden)

    H. Tonhati

    2010-02-01

    Full Text Available The objectives of this study were to estimate (covariance functions for additive genetic and permanent environmental effects, as well as the genetic parameters for milk yield over multiple parities, using random regressions models (RRM. Records of 4,757 complete lactations of Murrah breed buffaloes from 12 herds were analyzed. Ages at calving were between 2 and 11 years. The model included the additive genetic and permanent environmental random effects and the fixed effects of contemporary groups (herd, year and calving season and milking frequency (1 or 2. A cubic regression on Legendre orthogonal polynomials of ages was used to model the mean trend. The additive genetic and permanent environmental effects were modeled by Legendre orthogonal polynomials. Residual variances were considered homogenous or heterogeneous, modeled through variance functions or step functions with 5, 7 or 10 classes. Results from Akaike’s and Schwarz’s Bayesian information criterion indicated that a RRM considering a third order polynomial for the additive genetic and permanent environmental effects and a step function with 5 classes for residual variances fitted best. Heritability estimates obtained by this model varied from 0.10 to 0.28. Genetic correlations were high between consecutive ages, but decreased when intervals between ages increased

  16. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Directory of Open Access Journals (Sweden)

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-?] indole, diazepino [1,2-?] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  17. Genome-wide association analysis by lasso penalized logistic regression

    OpenAIRE

    Wu, Tong Tong; Chen, Yi Fang; Hastie, Trevor; Sobel, Eric; Lange, Kenneth

    2009-01-01

    Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations.

  18. Evaluating geographically weighted regression models for environmental chemical risk analysis.

    Science.gov (United States)

    Czarnota, Jenna; Wheeler, David C; Gennings, Chris

    2015-01-01

    In the evaluation of cancer risk related to environmental chemical exposures, the effect of many correlated chemicals on disease is often of interest. The relationship between correlated environmental chemicals and health effects is not always constant across a study area, as exposure levels may change spatially due to various environmental factors. Geographically weighted regression (GWR) has been proposed to model spatially varying effects. However, concerns about collinearity effects, including regression coefficient sign reversal (ie, reversal paradox), may limit the applicability of GWR for environmental chemical risk analysis. A penalized version of GWR, the geographically weighted lasso, has been proposed to remediate the collinearity effects in GWR models. Our focus in this study was on assessing through a simulation study the ability of GWR and GWL to correctly identify spatially varying chemical effects for a mixture of correlated chemicals within a study area. Our results showed that GWR suffered from the reversal paradox, while GWL overpenalized the effects for the chemical most strongly related to the outcome. PMID:25983546

  19. Logistic regression analysis on the risk factors of radiation pneumonitis

    International Nuclear Information System (INIS)

    Objective: To identify the risk factors of radiation pneumonitis (RP). Methods: A retrospective study was conducted on 101 patients with radiation pneumonitis using SPSS 8.0 software. Factors evaluated included: gender, age, pathology, clinical stage, irradiation dose, irradiation field size, history of smoking, cardiovascular disease, bronchitis, surgery, chemotherapy, lung infection, atelectasis, obstructive infection and pleural effusion. Univariate analysis was performed using Chi-Square test and multivariate analysis was performed using Logistic regression model. Results: Univariate analysis revealed a significant relationship between 10 factors: pulmonary infection, atelectasis, obstructive infection, cardiovascular disease, bronchitis, chemotherapy, irradiation dose, number of days of radiation and irradiation field size were factors leading to radiation pneumonitis. Multivariate analysis showed that 9 factors: pulmonary infection, obs tractive infection, atelectasis, pleural effusion, bronchitis, cardiovascular disease, chemotherapy, irradiation dose, and irradiation field size were independent factors. Conclusion: Comprehensive consideration of the accompanying disease, chemotherapy, dose, field size, etc during the planning of radiotherapy is able to minimize the possibility of developing radiation pneumonitis

  20. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  1. A Quantile Regression Analysis of Micro-lending's Poverty Impact

    Directory of Open Access Journals (Sweden)

    Stephen W. Polk

    2012-07-01

    Full Text Available This paper aims to evaluate the impact of a microlending program on ameliorating measured poverty within its client population, with the aim of improving that impact. We analyze over 18,000 women micro-finance clients of the Negros Women for Tomorrow Foundation (NWTF, a database using the Progress out of Poverty (PPI Scorecard as a measure of poverty. Analysis using both OLS and quantile multivariate regression models shows how observable borrower attributes affect the ability of clients to reduce their measured poverty. Loan size, duration, and the economic activity supported all have strongly identifiable effects. Moreover, estimates suggest which among the poor are receiving the greatest effective help by the program. Results offer specific advice to the NWTF and other micro-lenders: impact is greatest with fewer, larger loans in particular economic sectors (sari-sari, service and trade but require patience as each additional year increases the client’s average change in poverty score.

  2. Low-Cost Housing in Sabah, Malaysia: A Regression Analysis

    Directory of Open Access Journals (Sweden)

    Dullah Mulok

    2009-02-01

    Full Text Available Low-cost housing plays a vital role in the development process especially in providing accommodation to those who are less fortunate and the lower income group. This effort is also a step in overcoming the squatter problem which could cripple the competitive drive of the local community especially in the state of Sabah, Malaysia. This article attempts to look into the influencing factors to low-cost housing in Sabah namely the government’s budget (allocation for low cost housing projects and Sabah’s total population. At the same time, this study will attempt to show the implication from the development and economic crises which occurred during period 1971 to 2000 towards the provision of low cost houses in Sabah. Empirical analyses were conducted using the multiple linear regression method, stepwise and also the dummy variable approach in demonstrating the link. The empirical result shows that the government’s budget for low-cost housing is the main contributor to the provision of low-cost housing in Sabah. The empirical decision also suggests that economic growth namely Gross Domestic Product (GDP did not provide a significant effect to the low-cost housing in Sabah. However, almost all major crises that have beset upon Malaysia’s economy caused a significant and consistent effect to the low-cost housing in Sabah especially the financial crisis which occurred in mid 1997.

  3. Multiple regression equations to estimate the content of breast muscles, meat, and fat in Muscovy ducks.

    Science.gov (United States)

    Kleczek, K; Wawro, K; Wilkiewicz-Wawro, E; Makowski, W

    2006-07-01

    The aim of the present study was to derive multiple regression equations for in vivo estimation of the carcass lean and fat content in Muscovy ducks. The experimental materials consisted of 240 White Muscovy ducklings (120 male and 120 female). One hundred sixteen females aged 10 wk and 112 males aged 12 wk were slaughtered. Before slaughter the ducks were weighed, and the following body measurements were taken: humerus length, drumstick length, chest girth, breast-bone crest length, width between the humeral bones, chest depth, and breast muscle thickness. The coefficients of simple correlation between carcass tissue components and body measurements were calculated. It was found that live body weight was highly correlated with the weights of all tissue components (r = 0.701 to 0.857). In males a significant interrelation was found between breast muscle weight and all body measurements, whereas in females breast muscle weight was correlated with breast-bone crest length, chest girth, width between the humeral bones, chest depth, and breast muscle thickness only. In both males and females the carcass lean content was closely correlated with drumstick length, breast-bone crest length, chest girth, and width between the humeral bones. In drakes the carcass fat content was closely correlated with all body measurements, whereas in hens significant correlations were observed between the carcass fat content and chest girth, width between the humeral bones, and chest depth only. The coefficients of simple correlation between the percentages of carcass tissue components and body measurements were generally low and statistically nonsignificant. Twelve multiple regression equations formulated based on the body measurements of live ducks were verified with respect to the accuracy of estimation of the content of breast muscles, meat, and fat with skin in the carcass. These equations give small SE of the estimate (Sy = 23.3 to 83.8 g), high values of coefficients of multiple correlation between the dependent variable and the set of independent variables, and high values of determination coefficients. PMID:16830875

  4. Development of Multiple Regression and Neural Network Models for Assessment of Blasting Dust at a Large Surface Coal Mine

    Directory of Open Access Journals (Sweden)

    T.A. Renaldy

    2011-01-01

    Full Text Available oped for prediction of particulate matter. The performance of the multiple regression models was assessed. For the development of neural network models, a feed forward with back propagation learning algorithm was used to train the network. The performance of neural network was determined in terms of correlation coefficient (R and Mean Square Error (MSE. The optimum number of hidden neurons was found out for obtaining the lowest value of MSE and the highest value of R. The results indicated that the network can predict particulate concentrations better than multiple regression models.

  5. Prediction of radiation levels in residences: A methodological comparison of CART [Classification and Regression Tree Analysis] and conventional regression

    International Nuclear Information System (INIS)

    In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ?200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs

  6. Use of generalized regression models for the analysis of stress-rupture data

    International Nuclear Information System (INIS)

    The design of components for operation in an elevated-temperature environment often requires a detailed consideration of the creep and creep-rupture properties of the construction materials involved. Techniques for the analysis and extrapolation of creep data have been widely discussed. The paper presents a generalized regression approach to the analysis of such data. This approach has been applied to multiple heat data sets for types 304 and 316 austenitic stainless steel, ferritic 21/4 Cr-1 Mo steel, and the high-nickel austenitic alloy 800H. Analyses of data for single heats of several materials are also presented. All results appear good. The techniques presented represent a simple yet flexible and powerful means for the analysis and extrapolation of creep and creep-rupture data

  7. The role of multiple regression and exploratory data analysis in the development of leukemia incidence risk models for comparison of radionuclide air stack emissions from nuclear and coal power industries

    International Nuclear Information System (INIS)

    Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)

  8. Framing an Nuclear Emergency Plan using Qualitative Regression Analysis

    International Nuclear Information System (INIS)

    Since the arising on safety maintenance issues due to post-Fukushima disaster, as well as, lack of literatures on disaster scenario investigation and theory development. This study is dealing with the initiation difficulty on the research purpose which is related to content and problem setting of the phenomenon. Therefore, the research design of this study refers to inductive approach which is interpreted and codified qualitatively according to primary findings and written reports. These data need to be classified inductively into thematic analysis as to develop conceptual framework related to several theoretical lenses. Moreover, the framing of the expected framework of the respective emergency plan as the improvised business process models are abundant of unstructured data abstraction and simplification. The structural methods of Qualitative Regression Analysis (QRA) and Work System snapshot applied to form the data into the proposed model conceptualization using rigorous analyses. These methods were helpful in organising and summarizing the snapshot into an 'as-is' work system that being recommended as 'to-be'work system towards business process modelling. We conclude that these methods are useful to develop comprehensive and structured research framework for future enhancement in business process simulation. (author)

  9. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models

    Science.gov (United States)

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…

  10. The utility of regression-based norms in interpreting the minimal assessment of cognitive function in multiple sclerosis (MACFIMS).

    Science.gov (United States)

    Parmenter, Brett A; Testa, S Marc; Schretlen, David J; Weinstock-Guttman, Bianca; Benedict, Ralph H B

    2010-01-01

    The Minimal Assessment of Cognitive Function in Multiple Sclerosis (MACFIMS) is a consensus neuropsychological battery with established reliability and validity. One of the difficulties in implementing the MACFIMS in clinical settings is the reliance on manualized norms from disparate sources. In this study, we derived regression-based norms for the MACFIMS, using a unique data set to control for standard demographic variables (i.e., age, age2, sex, education). Multiple sclerosis (MS) patients (n = 395) and healthy volunteers (n = 100) did not differ in age, level of education, sex, or race. Multiple regression analyses were conducted on the performance of the healthy adults, and the resulting models were used to predict MS performance on the MACFIMS battery. This regression-based approach identified higher rates of impairment than manualized norms for many of the MACFIMS measures. These findings suggest that there are advantages to developing new norms from a single sample using the regression-based approach. We conclude that the regression-based norms presented here provide a valid alternative to identifying cognitive impairment as measured by the MACFIMS. PMID:19796441

  11. Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

    DEFF Research Database (Denmark)

    Østergaard, SØren; Ettema, Jehan Frans

    Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis

  12. Adaptive regression analysis: theory and applications in econometrics

    Directory of Open Access Journals (Sweden)

    J. García Pérez

    2003-01-01

    Full Text Available In this work we (a discuss some theoretical and computational difficulties of regression analysing dependences, describing the behaviour of the heterogeneous systems, (b offer a set of new techniques adaptable to regression analysing the heterogeneous dependences and (c demonstrate the advantages of application of these new techniques in econometrics.

  13. Use of a neural network and a multiple regression model to predict histologic grade of astrocytoma from MRI appearances

    International Nuclear Information System (INIS)

    Several MRI features of supratentorial astrocytomas are associated with high histologic grade by statistically significant p values. We sought to apply this information prospectively to a group of astrocytomas in the prediction of tumor grade. We used 10 MRI features of fibrillary astrocytomas from 52 patient studies to develop neural network and multiple linear regression models for practical use in predicting tumor grade. The models were tested prospectively on MR images from 29 patient studies. The performance of the models was compared against that of a radiologist. Neural network accuracy was 61 % in distinguishing between low and high grade tumors. Multiple linear regression achieved an accuracy of 59 %. Assessment of the images by a radiologist yielded 57 % accuracy. We conclude that while certain MRI parameters may be statistically related to astrocytoma histologic grade, neural network and linear regression models cannot reliably use them to predict tumor grade. (orig.)

  14. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    Science.gov (United States)

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  15. Estimating the Coefficient of Cross-validity in Multiple Regression: A Comparison of Analytical and Empirical Methods.

    Science.gov (United States)

    Kromrey, Jeffrey D.; Hines, Constance V.

    1996-01-01

    The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)

  16. Maternal multiple micronutrient supplementation and pregnancy outcomes in developing countries: meta-analysis and meta-regression / Supplémentation maternelle en micronutriments multiples et issues de la grossesse dans les pays en voie de développement: méta-analyse et méta-régression / Administración de múltiples micronutrientes durante el embarazo y resultados en los países en vías de desarrollo: metanálisis y metarregresión

    Scientific Electronic Library Online (English)

    Kosuke, Kawai; Donna, Spiegelman; Anuraj H, Shankar; Wafaie W, Fawzi.

    2011-06-01

    Full Text Available RESUMEN OBJETIVO: Realizar una revisión sistemática de ensayos aleatorizados y controlados en los que se compara el efecto de la administración de múltiples micronutrientes con el de la administración de hierro y ácido fólico sobre los resultados de los embarazos en los países en vías de desarrollo. [...] MÉTODOS: Se realizaron búsquedas en MEDLINE y EMBASE. Los resultados de interés fueron: peso del neonato, bajo peso neonatal, neonatos con una talla baja para la edad gestacional, mortalidad perinatal y mortalidad neonatal. Se calcularon los riesgos relativos (RR) agrupados, empleando modelos de efectos aleatorios. Se investigaron las fuentes de heterogeneidad del metanálisis y la metarregresión de los subgrupos. RESULTADOS: La administración de múltiples micronutrientes fue más eficaz que la administración de hierro y ácido fólico a la hora de reducir el riesgo del peso bajo neonatal (RR=0,86, IC del 95%=0,79-0,93) y la talla baja para la edad gestacional (RR=0,85; IC del 95%=0,78-0,93). La administración de micronutrientes no tuvo un efecto global en la mortalidad perinatal (RR=1,05; IC del 95%=0,90-1,22), si bien la heterogeneidad fue importante y evidente (I²=58%; p de heterogeneidad=0,008). Los análisis de los subgrupos y de la metarregresión sugirieron que la administración de micronutrientes estaba asociada a un menor riesgo de mortalidad perinatal en aquellos estudios en los que más del 50% de las madres tenía formación universitaria (RR=0,93; IC del 95%=0,82-1,06) o en los que la administración se inició después de una media de 20 semanas de gestación (RR=0,88; IC del 95%=0,80-0,97). CONCLUSIÓN: La educación de la madre o la edad gestacional en la que se inició la administración pueden haber contribuido a los efectos heterogéneos observados en la mortalidad perinatal. Se debe seguir investigando la seguridad, la eficacia y la efectividad de la administración de micronutrientes a mujeres embarazadas. Abstract in english OBJECTIVE: To systematically review randomized controlled trials comparing the effect of supplementation with multiple micronutrients versus iron and folic acid on pregnancy outcomes in developing countries. METHODS: MEDLINE and EMBASE were searched. Outcomes of interest were birth weight, low birth [...] weight, small size for gestational age, perinatal mortality and neonatal mortality. Pooled relative risks (RRs) were estimated by random effects models. Sources of heterogeneity were explored through subgroup meta-analyses and meta-regression. FINDINGS: Multiple micronutrient supplementation was more effective than iron and folic acid supplementation at reducing the risk of low birth weight (RR:0.86, 95% confidence interval, CI:0.79-0.93) and of small size for gestational age (RR:0.85; 95% CI: 0.78-0.93). Micronutrient supplementation had no overall effect on perinatal mortality (RR:1.05; 95% CI:0.90-1.22), although substantial heterogeneity was evident (I²=58%; P for heterogeneity=0.008). Subgroup and meta-regression analyses suggested that micronutrient supplementation was associated with a lower risk of perinatal mortality in trials in which >50% of mothers had formal education (RR:0.93; 95% CI:0.82-1.06) or in which supplementation was initiated after a mean of 20 weeks of gestation (RR:0.88; 95% CI:0.80-0.97). CONCLUSION: Maternal education or gestational age at initiation of supplementation may have contributed to the observed heterogeneous effects on perinatal mortality. The safety, efficacy and effective delivery of maternal micronutrient supplementation require further research.

  17. Regression analysis of mean lifetime: exploring nonlinear relationship with heteroscedasticity.

    Science.gov (United States)

    Sun, Zhiping; Jiang, Wenxin

    2007-01-01

    As a generalization of the accelerated failure time models, we consider parametric models of lifetime Y, where the conditional mean E(Y|X;beta) can depend nonlinearly on the covariates X and some parameters beta. The error distribution can be heteroscedastic and dependent on X. With observed data subject to right censoring, we propose regression analysis for beta based on Kaplan- Meier estimates of the means over several regions of X. Consistency and asymptotic distributional properties of the estimators are established under general conditions. A resulting estimator of beta is shown to be the sum of two possibly dependent asymptotic normal quantities, based on which conservative confidence intervals and tests are derived. Simulation studies are conducted to investigate the performance of the proposed estimator and to compare it with Buckley-Jame's method. To illustrate the methodology, we study an example with kidney transplant data, where a nonlinear relationship called "mixtures-of-experts", proposed in the neural networks literature, is used to model the relationship between the survival time and the age of the patients. PMID:22550647

  18. Improved performance of a two-element TLD badge for determining gamma and beta doses using multiple linear regression

    International Nuclear Information System (INIS)

    The gamma/beta TLD badge used by OPPD consists of two TLD-700 chips (Harshaw G7 card), one of which (chip number sign 2) is shielded by a 0.102 cm-thick aluminum filter, and the other (chip number sign 1) is unshielded, as shown in Fig. 1. Standard procedure had been to determine the beta dose to the badge by subtracting the response of chip number sign 2 from that of chip number sign 1 and then dividing by a calibrated beta-sensitivity factor; the gamma dose was taken to be the response of chip number sign 2 divided by the chip's gamma-sensitivity factor followed by the subtraction of the background dose. A problem with this procedure is penetration of energetic beta particles through the aluminum filter on chip number sign 2 which causes an over-response. Due to the technique used to obtain the beta dose, this also results in an under-estimate of the beta dose. This problem has been corrected through application of multiple linear regression analysis on a large data base of pure gamma (137Cs), pure beta (90Sr), and mixed exposures. The outcome of the analysis is an algorithm that automatically corrects for penetration effects. Performance tests using the ANSI N13.11 standard are presented to show the improvement

  19. Comparison of Hyperbolic and Constant Width Simultaneous Confidence Bands in Multiple Linear Regression under MVCS Criterion

    OpenAIRE

    LIU, W.; Hayter, A. J.; Piegorsch, W. W.

    2009-01-01

    A simultaneous confidence band provides useful information on the plausible range of the unknown regression model, and different confidence bands can often be constructed for the same regression model. For a simple regression line, it is proposed in Liu and Hayter (2007) to use the area of the confidence set that corresponds to a confidence band as an optimality criterion in comparison of confidence bands; the smaller is the area of the confidence set, the better is the corresponding confiden...

  20. Change Impact Analysis Based Regression Testing of Web Services

    OpenAIRE

    Chaturvedi, Animesh

    2014-01-01

    Reducing the effort required to make changes in web services is one of the primary goals in web service projects maintenance and evolution. Normally, functional and non-functional testing of a web service is performed by testing the operations specified in its WSDL. The regression testing is performed by identifying the changes made thereafter to the web service code and the WSDL. In this thesis, we present a tool-supported approach to perform efficient regression testing of...

  1. Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data

    OpenAIRE

    Burgess, Stephen; Butterworth, Adam; Thompson, Simon G

    2013-01-01

    Genome-wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to th...

  2. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    2013-01-01

    This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent vari...

  3. Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data

    OpenAIRE

    Sara M. Gustavsson; Sandra Johannesson; Gerd Sallsten; Eva M. Andersson

    2012-01-01

    Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large si...

  4. Varying-coefficient functional linear regression

    OpenAIRE

    Wu, Yichao; Fan, Jianqing; Müller, Hans-Georg

    2011-01-01

    Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regressi...

  5. Multiple logistic regression model of signalling practices of drivers on urban highways

    Science.gov (United States)

    Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana

    2015-05-01

    Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.

  6. Buffalos milk yield analysis using random regression models

    Directory of Open Access Journals (Sweden)

    A.S. Schierholt

    2010-02-01

    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  7. MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis

    OpenAIRE

    Audigier, Vincent; Husson, François; Josse, Julie

    2015-01-01

    We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of ...

  8. Use of multiple regression models in the study of sandhopper orientation under natural conditions

    Science.gov (United States)

    Marchetti, Giovanni M.; Scapini, Felicita

    2003-10-01

    In sandhoppers (Amphipoda; Talitridae), typical dwellers of the supralittoral zone of sandy beaches, orientation with respect to the sun and landscape vision is adapted to the local direction of the shoreline. Variation of this behavioural adaptation can be related to the characteristics of the beach. Measures of orientation with respect to the shoreline direction can thus be made as a tool to assess beach stability versus changeability, once the sources of variation are correctly interpreted. Orientation of animals can be studied by statistical analysis of directions taken after release in nature. In this paper some new tools for exploring directional data are reviewed, with special emphasis on non-parametric smoothers and regression models. Results from a large study concerning one species of sandhoppers, Talitrus saltator (Montagu), from an exposed sandy beach in northeastern Tunisia are presented. Seasonal differences in orientation behaviour were shown with a higher scatter in autumn with respect to spring. The higher scatter shown in autumn depended both on intrinsic (sex) and external (climatic conditions and landscape visibility) factors and was related to the tendency of this species to migrate towards the dune anticipating winter conditions.

  9. INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD

    Directory of Open Access Journals (Sweden)

    Eglantina HYSA

    2012-06-01

    Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.

  10. Unplanned dilution and ore loss prediction in longhole stoping mines via multiple regression and artificial neural network analyses

    Scientific Electronic Library Online (English)

    H., Jang; E., Topal; Y., Kawamura.

    2015-05-01

    Full Text Available Unplanned dilution and ore loss directly influence not only the productivity of underground stopes, but also the profitability of the entire mining process. Stope dilution is a result of complex interactions between a number of factors, and cannot be predicted prior to mining. In this study, unplann [...] ed dilution and ore loss prediction models were established using multiple linear and nonlinear regression analysis (MLRA and MNRA), as well as an artificial neural network (ANN) method based on 1067 datasets with ten causative factors from three underground longhole stoping mines in Western Australia. Models were established for individual mines, as well as a general model that includes all of the mine data-sets. The correlation coefficient (R) was used to evaluate the methods, and the values for MLRA, MNRA, and ANN compared with the general model were 0.419, 0.438, and 0.719, respectively. Considering that the current unplanned dilution and ore loss prediction for the mines investigated yielded an R of 0.088, the ANN model results are noteworthy. The proposed ANN model can be used directly as a practical tool to predict unplanned dilution and ore loss in mines, which will not only enhance productivity, but will also be beneficial for stope planning and design.

  11. Analysis on Train Stopping Accuracy based on Regression Algorithms

    Directory of Open Access Journals (Sweden)

    Lin Ma

    2014-05-01

    Full Text Available Stopping accuracy is one of the most important indexes of efficiency of automatic train operation (ATO systems. Traditional stopping control algorithms in ATO systems have some drawbacks, as many factors have not been taken into account. In the large amount of field-collected data about stopping accuracy there are many factors (e.g. system delays, stopping time, net pressure which affecting stopping accuracy. In this paper, three popular data mining methods are proposed to analyze the train stopping accuracy. Firstly, we find fifteen factors which have impact on the stopping accuracy. Then, ridge regression, lasso regression and elastic net regression are employed to mine models to reflecting the relationship between the fifteen factors and the stopping accuracy. Then, the three models are compared by using Akaike information criterion (AIC, a model selection criterion which considering the trade-off between accuracy and complexity. The computational results show that elastic net regression model has a best performance on AIC value. Finally, we obtain the parameters which can make the train stop more accurately which can provide a reference to improve stopping accuracy for ATO systems.

  12. Permutation-Based Adjustments for the Significance of Partial Regression Coefficients in Microarray Data Analysis

    OpenAIRE

    Wagner, Brandie D.; Zerbe, Gary O; Mexal, Sharon; Leonard, Sherry S.

    2008-01-01

    The aim of this paper is to generalize permutation methods for multiple testing adjustment of significant partial regression coefficients in a linear regression model used for microarray data. Using a permutation method outlined by Anderson and Legendre [1999] and the permutation P-value adjustment from Simon et al. [2004], the significance of disease related gene expression will be determined and adjusted after accounting for the effects of covariates, which are not restricted to be categori...

  13. Correlating phosphoproteomic signaling to castration resistant prostate cancer survival through regression analysis

    OpenAIRE

    Lescarbeau, Reynald; Kaplan, David L.

    2014-01-01

    Prostate cancer most commonly presents as initially castration dependent, however in a minority of patients the disease will progress to a state of castration resistant. Here, approaches to correlating alterations in the phosphoproteome to androgen independent cell survival in the LNCaP, PC3, and MDa-PCa-2b cells lines are discussed. The performance of the regression techniques multiple linear, ridge, principal component, and partial least squares regression are compared. The predictive perfo...

  14. Multi-stratified multiple regression tests of the linear/no-threshold theory of radon-induced lung cancer

    International Nuclear Information System (INIS)

    A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy

  15. A Meta-Regression Analysis of Forest Carbon Offset Costs

    OpenAIRE

    Kooten, G.C. van; Laaksonen-Craig, S.; Wang, Y

    2009-01-01

    The main focus of efforts to mitigate climate change is on the avoidance of fossil fuel emissions. However, the Kyoto Protocol rules permit the use of forestry activities that create carbon offset credits. These could obviate the need for lifestyle-changing reductions in fossil fuel use. Therefore, it is necessary for policy purpose to determine the cost effectiveness of creating forest sink carbon credits. In this study, meta-regression analyses with 1047 observations from 68 studies are use...

  16. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  17. Are Fiscal Multipliers Regime-Dependent? A Meta Regression Analysis

    OpenAIRE

    Gechert, Sebastian; Rannenberg, Ansgar

    2014-01-01

    Die Studie untersucht, ob fiskalische Multiplikatoreffekte im Abschwung systematisch größer sind als im Aufschwung. Dazu wird eine Meta-Regressions-Analyse durchgeführt, die einen neuartigen Datensatz von 98 empirischen Studien mit über 1800 Beobachtungen von Multiplikatoreffekten auswertet und für die Regime-Abhängigkeit von Multiplikatoren kontrolliert. Es zeigt sich, dass ausgabeseitige Multiplikatoren im Abschwung um 0,6 bis 0,8 Punkte höher liegen. Darüber hinaus übersteigen ausgabeseiti...

  18. Evaluating Geographically Weighted Regression Models for Environmental Chemical Risk Analysis

    OpenAIRE

    Czarnota, Jenna; Wheeler, David C.; Gennings, Chris

    2015-01-01

    In the evaluation of cancer risk related to environmental chemical exposures, the effect of many correlated chemicals on disease is often of interest. The relationship between correlated environmental chemicals and health effects is not always constant across a study area, as exposure levels may change spatially due to various environmental factors. Geographically weighted regression (GWR) has been proposed to model spatially varying effects. However, concerns about collinearity effects, incl...

  19. Analysis of some methods for reduced rank Gaussian process regression

    DEFF Research Database (Denmark)

    Quinonero-Candela, J.; Rasmussen, Carl Edward

    2005-01-01

    While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

  20. BRGLM, Interactive Linear Regression Analysis by Least Square Fit

    International Nuclear Information System (INIS)

    1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems

  1. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    CERN Document Server

    Kristoufek, Ladislav

    2014-01-01

    We propose a novel framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential non-stationarity and power-law correlations. Selected examples from physics, finance and environmental sciences illustrate usefulness of the framework.

  2. Use of Structure Coefficients in Published Multiple Regression Articles: Beta Is Not Enough.

    Science.gov (United States)

    Courville, Troy; Thompson, Bruce

    2001-01-01

    Reviewed articles published in the "Journal of Applied Psychology" (JAP) to determine how interpretations might have differed if standardized regression coefficients and structure coefficients (or bivariate "r"s of predictors with the criterion) had been interpreted. Summarizes some dramatic misinterpretations or incomplete interpretations.…

  3. Estimation of streamflow, base flow, and nitrate-nitrogen loads in Iowa using multiple linear regression models

    Science.gov (United States)

    Schilling, K.E.; Wolter, C.F.

    2005-01-01

    Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).

  4. Multiple factor analysis by example using R

    CERN Document Server

    Pagès, Jérôme

    2014-01-01

    Multiple factor analysis (MFA) enables users to analyze tables of individuals and variables in which the variables are structured into quantitative, qualitative, or mixed groups. Written by the co-developer of this methodology, Multiple Factor Analysis by Example Using R brings together the theoretical and methodological aspects of MFA. It also includes examples of applications and details of how to implement MFA using an R package (FactoMineR).The first two chapters cover the basic factorial analysis methods of principal component analysis (PCA) and multiple correspondence analysis (MCA). The

  5. Selection of higher order regression models in the analysis of multi-factorial transcription data

    OpenAIRE

    Prazeres Da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W.; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim

    2014-01-01

    Introduction: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment invest...

  6. Regularized Multiple-Set Canonical Correlation Analysis

    Science.gov (United States)

    Takane, Yoshio; Hwang, Heungsun; Abdi, Herve

    2008-01-01

    Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we…

  7. A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data

    International Nuclear Information System (INIS)

    Using about 138,000 measurements of surface pCO2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO2 is higher than estimates based on available syntheses of surface pCO2. This supports earlier suggestions that the sink of CO2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 ?atm. The subpolar gyre is a net sink of CO2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof

  8. Regression analysis of country effects using multilevel data: A cautionary tale

    OpenAIRE

    Bryan, Mark L.; Jenkins, Stephen P.

    2013-01-01

    Cross-national differences in outcomes are often analysed using regression analysis of multilevel country datasets, examples of which include the ECHP, ESS, EU-SILC, EVS, ISSP, and SHARE. We review the regression methods applicable to this data structure, pointing out problems with the assessment of country-level factors that appear not to be widely appreciated, and illustrate our arguments using Monte-Carlo simulations and analysis of women's employment probabilities and work hours using EU ...

  9. The use of cognitive ability measures as explanatory variables in regression analysis

    OpenAIRE

    Junker, Brian; Steuerle Schofield, Lynne; Taylor, Lowell J. Taylor

    2012-01-01

    Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score, constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We exami...

  10. Econometric analysis of realized covariation: high frequency based covariance, regression, and correlation in financial economics

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole Eiler; Shephard, N.

    2004-01-01

    This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular w...

  11. Economic growth and electricity consumption: Auto regressive distributed lag analysis

    Scientific Electronic Library Online (English)

    Melike E, Bildirici; Tahsin, Bakirtas; Fazil, Kayikci.

    Full Text Available Knowledge of the direction of causality between electricity consumption and economic growth is of primary importance if appropriate energy policies and energy conservation measures are to be devised. This study estimates the causality relationship between electricity consumption and economic growth [...] in per capita and aggregate levels. The study uses the price and income elasticities of total electricity demand and industrial demand by using the auto regressive distributed lag (ARDL) method for some developed and developing countries, including the US, UK, Canada, Japan, China, India, Brazil, Italy, France, Turkey and South Africa. There is evidence to support the growth hypothesis for the US, China, Canada and Brazil. There is evidence to support the conservation hypothesis for India, Turkey, South Africa, Japan, UK, France and Italy.

  12. Tobacco outlet density and demographics: a geographically weighted regression analysis.

    Science.gov (United States)

    Mayers, Raymond Sanchez; Wiggins, Lyna L; Fulghum, Fontaine H; Peterson, N Andrew

    2012-10-01

    Previous studies have indicated that tobacco outlets seem to be clustered in low-income minority neighborhoods. This study utilized a cross-sectional design to examine the relationships among minority status, median household income, population density, commercial land use, and location of tobacco outlets at the census tract level in Polk County, Iowa. Using geographically weighted regression, this study re-examines one previously carried out in the same location by Schneider et al. (Prevention Science 6: 319-325, 2005). Contrary to that and some other previous studies, this research found no relationship between tobacco outlet density and percent Hispanic, and found a negative relationship with regard to two variables-that of being African American and median household income. Positive significant relationships were found with population density and land use. PMID:22538505

  13. Regression Analysis and Analysis Of Variance for EN353 and20MnCr5 Alloyed Steels for Drilling Cutting Forces

    OpenAIRE

    Keerthiprasad.K; Prof Narendra Babu

    2014-01-01

    In recent years, alloy steels have been widely usedin aerospace and automotive industries. Machining of these materials requires better understanding of cutting processes regarding accuracy and efficiency. This study addresses the modelling of the machinability of EN353 and 20mncr5 materials. In this study, multiple regression analysis (MRA) is used to investigate the influence of some parameters on the thrust force and torque in the drilling processes of alloy steel materials...

  14. Positive-shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo study with Applications

    OpenAIRE

    Raheem, SM Enayetur; Ahmed, S. Ejaz

    2011-01-01

    Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is \\emph{a priori} known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage method combines restricted and unrestricted estimators to obtain the parameter estimat...

  15. Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant

    Directory of Open Access Journals (Sweden)

    Liguo Yu

    2012-07-01

    Full Text Available Negative binomial regression has been proposed as an approach to predicting fault-prone software modules. However, little work has been reported to study the strength, weakness, and applicability of this method. In this paper, we present a deep study to investigate the effectiveness of using negative binomial regression to predict fault-prone software modules under two different conditions, self-assessment and forward assessment. The performance of negative binomial regression model is also compared with another popular fault prediction model—binary logistic regression method. The study is performed on six versions of an open-source objected-oriented project, Apache Ant. The study shows (1 the performance of forward assessment is better than or at least as same as the performance of self-assessment; (2 in predicting fault-prone modules, negative binomial regression model could not outperform binary logistic regression model; and (3 negative binomial regression is effective in predicting multiple errors in one module.

  16. Análise de regressão múltipla das concentrações de PM10 em função de elementos meteorológicos para Porto Alegre, Estado do Rio Grande do Sul, em 2005 e 2006 = Multiple regression analysis of PM10 concentration concerning to meteorological elements for Porto Alegre, Rio Grande do Sul State, in 2005 and 2006

    Directory of Open Access Journals (Sweden)

    Angela Radünz Lazzari

    2011-01-01

    Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seucomportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se ocomportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: asconcentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; astemperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Dataanalysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured placesare east and southeast.

  17. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    Science.gov (United States)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  18. Multivariate Regression Approach To Integrate Multiple Satellite And Tide Gauge Data For Real Time Sea Level Prediction

    DEFF Research Database (Denmark)

    Cheng, Yongcun; Andersen, Ole Baltazar; Knudsen, Per

    2010-01-01

    The Sea Level Thematic Assembly Center in the EUFP7 MyOcean project aims at build a sea level service for multiple satellite sea level observations at a European level for GMES marine applications. It aims to improve the sea level related products to guarantee the sustainability and the quality of GMES marine core service. One such added value will be a multivariate regression model of sea level variability of multisatellite and in-situ tide gauge observations with the aim at improved future hig...

  19. Selecting Modeling Techniques for Outcome Prediction: Comparison of Artificial Neural Networks, Classification and Regression Trees, and Linear Regression Analysis for Predicting Medical Rehabilitation Outcomes

    OpenAIRE

    Walters, Deborah K. W.; Linn, Richard T.; Kulas, Margaret; Cuddihy, Elisabeth; Wu, Chonghua; Granger, Carl V.

    1999-01-01

    A multitude of techniques exists for modeling medical outcomes. One problem for the researcher is how to select an appropriate modeling technique for a given task. This paper addresses the problem through: an analysis of the strengths and weaknesses of three techniques; and, a case study in which the three techniques are applied to the task of predicting medical rehabilitation outcomes. The three techniques selected where linear regression analysis (LRA), classification and regression trees (...

  20. Comparative Analysis of MOGA, NSGA-II and MOPSO for Regression Test Suite Optimization

    Directory of Open Access Journals (Sweden)

    Zeeshan Anwar

    2014-01-01

    Full Text Available In Software Engineering Regression Testing is a mandatory activity. Whenever, a change in existing system occurs and new version appears, the unchanged portions need to be regression tested for any resulting undesirable effects. During process of Regression Testing, same test cases are executed repeatedly for un-modified portion of software. This activity is an overhead and consumes huge resources and budget. To save time and resources, researches have proposed various techniques for Regression Test Suite Optimization. In this research regression test suites are minimized using three Computational Intelligence multi-objective techniques for black box testing methods. These include; 1- Multi-Objective Genetic Algorithms (MOGA, 2- Non-Dominated Sorting Genetic Algorithm (NSGA-II and 3- Multi-Objective Particle Swarm Optimization (MOPSO. Said techniques are applied on two published case studies and through experimentation, the quality of these techniques is analyzed. Four quality metrics are defined to perform this analysis. The results of research show that MOGA is better for reducing the size and thus execution time of the regression test suites as compared to MOPSO and NSGA-II. It was also found that use of MOGA, NSGA-II and MOPSO are not safe for regression test suite optimization. This is because fault detection rate and requirement coverage is reduced after optimization of Regression Test Suites.

  1. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant. None of our macroeconomic covariates are significant.

  2. Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

    Energy Technology Data Exchange (ETDEWEB)

    Deng, Yangyang; Parajuli, Prem B.

    2011-08-10

    Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.

  3. Principal Component and Multiple Regression Analyses for the Estimation of Suspended Sediment Yield in Ungauged Basins of Northern Thailand

    OpenAIRE

    Piyawat Wuttichaikitcharoen; Mukand Singh Babel

    2014-01-01

    Predicting sediment yield is necessary for good land and water management in any river basin. However, sometimes, the sediment data is either not available or is sparse, which renders estimating sediment yield a daunting task. The present study investigates the factors influencing suspended sediment yield using the principal component analysis (PCA). Additionally, the regression relationships for estimating suspended sediment yield, based on the selected key factors from the PCA, are develope...

  4. Thermo-environmental and economic analysis of simple and regenerative gas turbine cycles with regression modeling and optimization

    International Nuclear Information System (INIS)

    Highlights: • Thermodynamic models of simple and regenerative cycles are defined. • Exergy destruction rate of different components was determined. • Impact of important operating parameters on cycles’ characteristics was determined. • Multiple polynomial regression models were developed. • Optimization for optimal operating parameters was performed. - Abstract: In this paper, thermo-environmental, economic and regression analyses of simple and regenerative gas turbine cycles are exhibited. Firstly, thermodynamic models for both cycles are defined; exergy destruction rate of different components is determined and parametric study is carried out to investigate the effects of compressor inlet temperature, turbine inlet temperature and compressor pressure ratio on the parameters that measure cycles’ performance, environmental impact and costs. Subsequently, multiple polynomial regression (MPR) models are developed to correlate important response variables with predictor variables and finally optimization is performed for optimal operating conditions. The results of parametric study have shown a significant impact of operating parameters on the performance parameters, environmental impact and costs. According to exergy analysis, the combustion chamber and exhaust stack are two major sites where largest exergy destruction/losses occur. Also, the total exergy destruction in the regenerative cycle is relatively lower; thereby resulted in a higher exergy efficiency of the cycle. The MPR models are also appeared as good estimator of the response variables since appended with very high R2 values. Finally, these models are used to determine the optimal operating parameters, which maximize the cycles’ performance and minimize CO2 emissions and costs

  5. Some studies on cutting force and temperature in machining Ti-6Al-4V alloy using regression analysis and ANOVA

    OpenAIRE

    Satyanarayana, K; Ashok Kumar Sahoo; Ramanuj Kumar; G. Venkateswara Rao

    2013-01-01

    The present work deals with the cutting forces and cutting temperature produced during turning of titanium alloy Ti-6Al-4V with PVD TiN coated tungsten carbide inserts under dry environment. The 1st order mathematical models are developed using multiple regression analysis and optimized the process parameters using contour plots. The model presented high determination coefficient (R2 = 0.964 and 0.989 explaining 96.4 % and 98.9 % of the variability in the cutting force and cutting temperature...

  6. Tree-Ring Growth Response of Common Ash ( Fraxinus excelsior L. to Climatic Variables Using Multiple Regressions

    Directory of Open Access Journals (Sweden)

    H. Jalilvand

    2008-01-01

    Full Text Available   This study was down in Forest Park of Noor. In order to determination of tree ring response to climatic variations, 35 cores were taken from dominant natural stand of common ash (Fraxinus excelsior L.. The guide of this study was finding which climatic variables are effective in the ring width growth of ash in current growing year and previous years (one, two and three years before current growing year by multiple regression models at the North of IR-Iran. Totally, 85 annually, monthly seasons and seasonal growth climatic variations of precipitation, temperature, heat index, evapotranspiration and water balance were analyzed. The best multiple regression models were explained 83 percent of total variance of the growth of common ash. The results show that the growth of common ash was related to the previous year's climatic variations than that of the current year. The most effective role of climatic variations was due to the first and second preceding years (55%. Evapotranspiration of July and September, and precipitation of May in the second and precipitation of March in the third previous years, all were positively affected the growth of this species. This study revealed that ash is interested in warmer condition on early and middle of seasonal growth in present of available humid, and precipitation in the months of early growing season (Ordibehesht-Khordad of two previous years.

  7. ESTIMATE OF CO2 EFFLUX OF SOIL, OF A TRANSITION FOREST IN NORTHWEST OF MATO GROSSO STATE, USING MULTIPLE REGRESSION

    Directory of Open Access Journals (Sweden)

    Carla Maria Abido Valentini

    2008-03-01

    Full Text Available Many research groups have being studying the contribution of tropical forests to the global carbon cycle, and theclimatic consequences of substituting the forests for pastures. Considering that soil CO2 efflux is the greater component of the carboncycle of the biosphere, this work found an equation for estimating the soil CO2 efflux of an area of the Transition Forest, using a modelof multiple regression for time series data of temperature and soil moisture. The study was carried out in the northwest of MatoGrosso, Brazil (11°24.75’S; 55°19.50’W, in a transition forest between cerrado and AmazonForest, 50 km far from Sinop county.Each month, throughout one year, it was measured soil CO2 efflux, temperature and soil moisture. The annual average of soil CO2 efflux was 7.5 ± 0.6 (mean ± SE ì mol m-2 s-1, the annual mean soil temperature was 25,06 ± 0.12 (mean ± SE ºC. The study indicatedthat the humidity had high influence on soil CO2 efflux; however the results were more significant using a multiple regression modelthat estimated the logarithm of soil CO2 efflux, considering time, soil moisture and the interaction between time duration and theinverse of soil temperature. .

  8. Parent Progeny regression analysis in F2 and F3 generations of rice

    OpenAIRE

    Anilkumar , C. Vanniarajan*1 and J. Ramalingam

    2011-01-01

    Parent progeny regression analysis involving F2 and F3 generation of two crosses in rice was undertaken to estimate the geneticpotential transferred from one generation to other by adopting three levels of selection for single plant yield. Significant positivecorrelation and regression was observed in both crosses at positive level of selection (mean +1SD) between F3 mean and thecorresponding F2 values, indicating that selection of single plant yield at these levels would be effective in both...

  9. Regression and local control rates after radiotherapy for jugulotympanic paragangliomas: Systematic review and meta-analysis

    International Nuclear Information System (INIS)

    The primary treatment goal of radiotherapy for paragangliomas of the head and neck region (HNPGLs) is local control of the tumor, i.e. stabilization of tumor volume. Interestingly, regression of tumor volume has also been reported. Up to the present, no meta-analysis has been performed giving an overview of regression rates after radiotherapy in HNPGLs. The main objective was to perform a systematic review and meta-analysis to assess regression of tumor volume in HNPGL-patients after radiotherapy. A second outcome was local tumor control. Design of the study is systematic review and meta-analysis. PubMed, EMBASE, Web of Science, COCHRANE and Academic Search Premier and references of key articles were searched in March 2012 to identify potentially relevant studies. Considering the indolent course of HNPGLs, only studies with ?12 months follow-up were eligible. Main outcomes were the pooled proportions of regression and local control after radiotherapy as initial, combined (i.e. directly post-operatively or post-embolization) or salvage treatment (i.e. after initial treatment has failed) for HNPGLs. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Fifteen studies were included, concerning a total of 283 jugulotympanic HNPGLs in 276 patients. Pooled regression proportions for initial, combined and salvage treatment were respectively 21%, 33% and 52% in radiosurgery studies and 4%, 0% and 64% in external beam radiotherapy studies. Pooled local control proportions for radiotherapy as initial, combined and salvage treatment ranged from 79% to 100%. Radiotherapy for jugulotympanic paragangliomas results in excellent local tumor control and therefore is a valuable treatment for these types of tumors. The effects of radiotherapy on regression of tumor volume remain ambiguous, although the data suggest that regression can be achieved at least in some patients. More research is needed to identify predictors for treatment success

  10. Application of Binary Regression Analysis in the Prescription Pattern of Antidepressants

    OpenAIRE

    Dr. Indrajit Banerjee, MBBS, MD; Dr.Indraneel Banerjee, MBBS, MS, MRCS; Bedanta Roy; Dr.Brijesh Sathian MD(AM), PhD.

    2013-01-01

    Background:In Nepal several research studies are reported using percentages or cross tabulation method, but the relevance of logistic regression methodology in research is lag behind among the researchers. Objectives: The main objective of this study was to find the role of logistic regression analysis in the pattern of antidepressants in a tertiary care center in hospitalized patients of Western Nepal.Methods: A hospital based study was done between 1st October 2009 and 31st March 2010 at Ps...

  11. The application of a multiple regression model for aero radiometric data

    International Nuclear Information System (INIS)

    The data observed in the total channel of high sensitivity airborne ?-ray spectrometric surveys is selected as the dependent variable while those of the Th, K and U channels are considered as independent variables and a linear statistical model is assumed to relate them as (Total)sub(i) ?sub(0) + ?1(U)sub(i) + ?2(Th)sub(i) + ?3(K)sub(i) + ?sub(i), ?1, ?2, ?3, are the partial regression coefficients and ?sub(i) is the error term. The estimated coefficients (?1, ?2, ?3) are used to check on board the data acquisition system as well as to predict occasionally the more appropriate value of the data in case a single data item is not recorded correctly. (author)

  12. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    2013-01-01

    This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates and hence, also in biased measures, which are derived from the estimated parameters. This, in turn, can result in incorrect economic conclusions and recommendations for managers, politicians and decision makers in general. This PhD thesis focuses on a nonparametric econometric approach that can be used to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric kernel methods are well-suited to econometric production analysis and can outperform traditional parametric methods. Although the empirical focus of this thesis is on the application of nonparametric kernel regression in applied production analysis, the findings are also applicable to econometric estimations in general.

  13. SAS PARTIAL LEAST SQUARES REGRESSION FOR ANALYSIS OF SPECTROSCOPIC DATA

    Science.gov (United States)

    The objective was to investigate the potential of SAS PLS to perform chemometric analysis of spectroscopic data. As implemented, SAS can perform type II PLS only, PCR and RRR. While possessing several algorithms for PLS, various cross validation options, the ability to mean center and variance sca...

  14. Advanced GIS Exercise: Predicting Rainfall Erosivity Index Using Regression Analysis

    Science.gov (United States)

    Post, Christopher J.; Goddard, Megan A.; Mikhailova, Elena A.; Hall, Steven T.

    2006-01-01

    Graduate students from a variety of agricultural and natural resource fields are incorporating geographic information systems (GIS) analysis into their graduate research, creating a need for teaching methodologies that help students understand advanced GIS topics for use in their own research. Graduate-level GIS exercises help students understand…

  15. Robust regression applied to fractal/multifractal analysis.

    OpenAIRE

    Portilla, F.; Valencia Delfa, José Luis; Tarquis Alfonso, Ana Maria; Saa Requejo, Antonio

    2012-01-01

    Fractal and multifractal are concepts that have grown increasingly popular in recent years in the soil analysis, along with the development of fractal models. One of the common steps is to calculate the slope of a linear fit commonly using least squares method. This shouldn?t be a special problem, however, in many situations using experimental data the researcher has to select the range of scales at which is going to work neglecting the rest of points to achieve the best linearity that in thi...

  16. Using Multiple Regression in Estimating (semi) VOC Emissions and Concentrations at the European Scale

    DEFF Research Database (Denmark)

    Fauser, Patrik; Thomsen, Marianne

    2010-01-01

    This paper proposes a simple method for estimating emissions and predicted environmental concentrations (PECs) in water and air for organic chemicals that are used in household products and industrial processes. The method has been tested on existing data for 63 organic high-production volume chemicals available in the European Chemicals Bureau risk assessment reports (RARs). The method suggests a simple linear relationship between Henry's Law constant, octanol-water coefficient, use and production volumes, and emissions and PECs on a regional scale in the European Union. Emissions and PECs are a result of a complex interaction between chemical properties, production and use patterns and geographical characteristics. A linear relationship cannot capture these complexities; however, it may be applied at a cost-efficient screening level for suggesting critical chemicals that are candidates for an in-depth risk assessment. Uncertainty measures are not available for the RAR data; however, uncertainties for the applied regression models are given in the paper. Evaluation of the methods reveals that between 79% and 93% of all emission and PEC estimates are within one order of magnitude of the reported RAR values. Bearing in mind that the domain of the method comprises organic industrial high-production volume chemicals, four chemicals, prioritized in the Water Framework Directive and the Stockholm Convention on Persistent Organic Pollutants, were used to test the method for estimated emissions and PECs, with corresponding uncertainty intervals, in air and water at regional EU level.

  17. Data analysis and approximate models model choice, location-scale, analysis of variance, nonparametric regression and image analysis

    CERN Document Server

    Davies, Patrick Laurie

    2014-01-01

    Forgoing any concept of truth, Data Analysis and Approximate Models: Model Choice, Location-Scale, Analysis of Variance, Nonparametric Regression and Image Analysis presents statistical analysis/inference based on approximate models. Developed by the author, this approach consistently treats models as approximations to data, not to some underlying truth. The book first highlights problems with concepts such as likelihood and efficiency and covers the definition of approximation and its consequences. A chapter on discrete data then presents the total variation metric as well as the Kullback–Leibler and chi-squared discrepancies as measures of fit. After focusing on outliers, the book discusses the location-scale problem, including approximation intervals, and gives a new treatment of higher-way ANOVA. The next several chapters describe novel procedures of nonparametric regression based on approximation. The final chapter assesses a range of statistical topics, from the likelihood principle to asymptotics and...

  18. A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits.

    Science.gov (United States)

    Li, Z; Möttönen, J; Sillanpää, M J

    2015-12-01

    Linear regression-based quantitative trait loci/association mapping methods such as least squares commonly assume normality of residuals. In genetics studies of plants or animals, some quantitative traits may not follow normal distribution because the data include outlying observations or data that are collected from multiple sources, and in such cases the normal regression methods may lose some statistical power to detect quantitative trait loci. In this work, we propose a robust multiple-locus regression approach for analyzing multiple quantitative traits without normality assumption. In our method, the objective function is least absolute deviation (LAD), which corresponds to the assumption of multivariate Laplace distributed residual errors. This distribution has heavier tails than the normal distribution. In addition, we adopt a group LASSO penalty to produce shrinkage estimation of the marker effects and to describe the genetic correlation among phenotypes. Our LAD-LASSO approach is less sensitive to the outliers and is more appropriate for the analysis of data with skewedly distributed phenotypes. Another application of our robust approach is on missing phenotype problem in multiple-trait analysis, where the missing phenotype items can simply be filled with some extreme values, and be treated as outliers. The efficiency of the LAD-LASSO approach is illustrated on both simulated and real data sets. PMID:26174023

  19. Regression Analysis on the Chemical Descriptors of a Selected Class of DPP4 Inhibitors

    OpenAIRE

    Jose Isagani B. Janairo; Frumencio F. Co; Gerardo C. Janairo; Derrick Ethelbhert C. Yu

    2010-01-01

    The activity of a selected class of DPP4 inhibitors was assessed using quantum-chemical and physical descriptors. Using multiple linear regression model, it was found that ?E, LUMO energy, dipole, area, volume, molecular weight and ?H are the significant descriptors that can adequately assess the activity of the compounds. The model suggests that bulky and electrophilic inhibitors are desired. Furthermore a pair interaction between ?E and dipole as well as for LUMO energy and dipole were dete...

  20. Experimental and regression analysis for multi cylinder diesel engine operated with hybrid fuel blends

    OpenAIRE

    Gopal Rajendiran; Kavandappa-Goundar Mayilsamy; Ramasamy Subramanian; Natarajan Nedunchezhian; Ramasamy Venkatachalam

    2014-01-01

    The purpose of this research work is to build a multiple linear regression model for the characteristics of multicylinder diesel engine using multicomponent blends (diesel- pungamia methyl ester-ethanol) as fuel. Nine blends were tested by varying diesel (100 to 10% by Vol.), biodiesel (80 to 10% by vol.) and keeping ethanol as 10% constant. The brake thermal efficiency, smoke, oxides of nitrogen, carbon dioxide, maximum cylinder pressure, angle of maximum ...

  1. Analysis of multiple primary cancers

    International Nuclear Information System (INIS)

    From January 1971 to August 1979, 4156 patients with malignant tumor except brain tumor were registered at the Department of Radiotherapy, National Sapporo Hospital. Seventy-one patients out of them had multiple primary cancers. The incidence in our series was 1.71%. One patient had four separate primary cancers arising in respectively the cervix uteri, the sigmoid colon, the thymus and the stomach. In 27 cases (38.0%), the cancers occurred within 1 year of each other. The longest interval was 33 years. Five cases were considered to be radiation-induced cancers. They developed secondarily in the region irradiated in the period between 5 and 26 years after the completion of irradiation. In 25%, patient had a family history of cancer. (author)

  2. Lógica difusa vs. modelo de regresión múltiple para la selección de personal / Fuzzy logic vs. multiple regression for selection personnel

    Scientific Electronic Library Online (English)

    Carlos A, Díaz-Contreras; Alejandra, Aguilera-Rojas; Nathaly, Guillén-Barrientos.

    2014-10-01

    Full Text Available La incorporación de nuevo personal o la reasignación del ya existente a tareas específicas constituyen una decisión importante, porque el acierto en ella determinará la propia supervivencia de la empresa. En este contexto se vuelve relevante contar con un modelo de selección de personal que consider [...] e la información ambigua y los grados de incertidumbre que están asociados al momento de evaluar las valoraciones cualitativas de los postulantes y que pueda entregar resultados certeros y precisos, garantizando de esta manera el buen desempeño del cargo y reduciendo así el riesgo que conlleva la incorporación de nuevas personas. En este trabajo se elaboró un modelo de selección de personal, en condiciones de incertidumbre, aplicando Lógica Difusa, utilizando como datos de entrada las descripciones de cargos de una empresa del retail, con variables difusas triangulares y con solapamiento. Este fue comparado con un modelo clásico de regresión múltiple. Los resultados mostraron que, en este caso, el uso del modelo de regresión múltiple es más eficiente que el modelo de lógica difusa optado. Abstract in english The incorporation of new personnel or the reallocation of existing tasks is an important decision, since its correctness will determine the survival of the company. In this context, having a model of personnel selection, that considers the associated ambiguous information and degrees of uncertainty, [...] becomes relevant when assessing the qualitative value of the applicants, able to deliver accurate and precise results thus ensuring the good performance of the position and reducing the associated risk with the incorporation of new people. In this work, a model of personnel selection, in conditions of uncertainty using fuzzy logic and having as input the data descriptions of positions of a retail industry, with triangular fuzzy variables and overlap was developed. This was compared with a classical model of multiple regressions. The results showed in this case, that the use of the model of multiple regressions is more efficient than the opted model of fuzzy logic.

  3. The Contribution of Regression Analysis to the Elimination of Gender Based Wage Discrimination in Academia: A Simulation.

    Science.gov (United States)

    Raymond, Richard D.; And Others

    1990-01-01

    Describes the use of regression analysis in eliminating sex discrimination in a university's salary structure and examines regression models usually accepted by courts. Estimates salary regressions for a large, midwestern university for 1983-84 in a simulating exercise exploring alternative elimination methods. Includes 23 references and 11 court…

  4. Regression analysis with missing data and unknown colored noise: application to the MICROSCOPE space mission

    CERN Document Server

    Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M

    2015-01-01

    The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...

  5. Development of an empirical model of turbine efficiency using the Taylor expansion and regression analysis

    International Nuclear Information System (INIS)

    The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination Rc2 ? 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: ? Performed a critical review of empirical models of turbine efficiency. ? Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. ? Verified the method for developing the empirical model. ? Verified the model.

  6. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    Science.gov (United States)

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse. PMID:25461335

  7. Multi-Modal Multi-Task Learning for Joint Prediction of Multiple Regression and Classification Variables in Alzheimer’s Disease

    OpenAIRE

    Zhang, Daoqiang; Shen, Dinggang

    2011-01-01

    Many machine learning and pattern classification methods have been applied to the diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). Recently, rather than predicting categorical variables as in classification, several pattern regression methods have also been used to estimate continuous clinical variables from brain images. However, most existing regression methods focus on estimating multiple clinical variables separately and thus cannot uti...

  8. NEW IDEA FOR THE TOPOLOGICAL INDEX EVALUATION AND TREATISE MULTIPLE REGRESSION WITH THREE INDEPENDENT VARIABLES: SATURATED HYDROCARBONS USED LIKE A MODEL

    Scientific Electronic Library Online (English)

    E, CORNWELL.

    2006-03-01

    Full Text Available In QSRR discipline an easy novel to used parameter was designed (Vc) for evaluated classical topological index (W, ¹chi, Z, MTI) and two new generation ones (Xu, ¹chih). Regression between Vc and ¹chih presented a correlation index (r) of 0,9992, a surprising high value in comparison with that found [...] s commonly in QSPR/QSAR discipline. Through Vc parameter, an idea to treatise multiple three independent variable regression is present. Model of 35 saturated hydrocarbons were used

  9. Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete

    Directory of Open Access Journals (Sweden)

    Tavio Tavio

    2008-01-01

    Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures

  10. Partially linear censored quantile regression

    OpenAIRE

    Neocleous, T.; Portnoy, S.

    2009-01-01

    Censored regression quantile (CRQ) methods provide a powerful and flexible approach to the analysis of censored survival data when standard linear models are felt to be appropriate. In many cases however, greater flexibility is desired to go beyond the usual multiple regression paradigm. One area of common interest is that of partially linear models: one (or more) of the explanatory covariates are assumed to act on the response through a non-linear function. Here the CRQ approach of Portnoy (...

  11. Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.

    Science.gov (United States)

    Waugh, C. Keith

    This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…

  12. Semiparametric modeling and estimation of heteroscedasticity in regression analysis of cross-sectional data

    OpenAIRE

    Van Keilegom, Ingrid; Wang, Lan

    2010-01-01

    We consider the problem of modeling heteroscedasticity in semiparametric regression analysis of crosssectional data. Existing work in this setting is rather limited and mostly adopts a fully nonparametric variance structure. This approach is hampered by curse of dimensionality in practical applications. Moreover, the corresponding asymptotic theory is largely restricted to estimators that minimize certain smooth objective functions. The asymptotic derivation thus excludes semiparametric quant...

  13. Study of quantitative structure - property methods of linear regression analysis and neural networks

    Directory of Open Access Journals (Sweden)

    ?.?. ??????

    2007-02-01

    Full Text Available  Modelation of protonisation dependence on the values of molecular discriptors of various classesorganic compounds is carried out by the methods of multydimensional regressive analysis and neuron nets. Advantage of neuron nets method for guantitive relationships structure-property description is shown.

  14. Catching up with Harvard: Results from Regression Analysis of World Universities League Tables

    Science.gov (United States)

    Li, Mei; Shankar, Sriram; Tang, Kam Ki

    2011-01-01

    This paper uses regression analysis to test if the universities performing less well according to Shanghai Jiao Tong University's world universities league tables are able to catch up with the top performers, and to identify national and institutional factors that could affect this catching up process. We have constructed a dataset of 461…

  15. Declining Bias and Gender Wage Discrimination? A Meta-Regression Analysis

    Science.gov (United States)

    Jarrell, Stephen B.; Stanley, T. D.

    2004-01-01

    The meta-regression analysis reveals that there is a strong tendency for discrimination estimates to fall and wage discrimination exist against the woman. The biasing effect of researchers' gender of not correcting for selection bias has weakened and changes in labor market have made it less important.

  16. Comparison of Artificial Neural Network, Genetic Programming, Genetic Algorithm, and Multiple Linear Regression for Water Quality Modeling

    Science.gov (United States)

    Tufail, M.; Ormsbee, L.

    2006-12-01

    In a watershed framework, the selection of a particular type of water quality model depends on several factors such as complexity of process being modeled, input data requirements, modeling objectives, and model applicability. For most applications, process-based simulation models or mechanistic models are routinely used to quantify the response of different hydrologic and water quality processes occurring in a watershed. In a complex watershed, the modeling objectives may require the use of multiple models of varying complexity. For instance, both a watershed-scale loading as well as a receiving water model may be needed for a watershed of sufficient complexity in which both point and non-point sources of pollution are being modeled. Recently, inductive or data-driven models are increasingly used for applications in watershed management. Examples of inductive models range from simple linear regression models to more complex nonlinear models based on artificial neural networks. Both linear and non-linear inductive models can be used to fit a mathematical model to a given data set in order to represent a process. Inductive or data-driven models are becoming more and more popular due to their ease of use and simplicity as substitutes for more process-based models in a number of applications. For instance, inductive models may be preferred where 1) computational expense is a critical issue, 2) the process-based deductive models are over parameterized and cannot be adequately calibrated, 3) budgetary constraints do not allow for a complex deductive model, and 4) quick and simple models are needed for integration into an optimal management framework for evaluating multiple scenarios in a relatively short period of time. Both explicit inductive or implicit inductive models can be developed in such applications. While implicit inductive models require output from a calibrated mechanistic model of the watershed, explicit inductive models can be easily developed using raw data collected for the process being modeled. More recently, inductive models derived using evolutionary and biological principles are becoming increasingly popular. These include artificial intelligence-based models such as artificial neural networks, genetic algorithms, and genetic programming. This paper will compare these techniques among themselves as well as with a simple baseline technique such as multiple linear regression models for application to water quality modeling in a watershed management framework. Example applications include modeling water quality parameters such as pathogens, dissolved oxygen, total nitrogen, and total phosphorus in an urban watershed.

  17. Analysis of radial velocity variations in multiple planetary systems

    CERN Document Server

    Pál, András

    2010-01-01

    The study of multiple extrasolar planetary systems has the opportunity to obtain constraints for the planetary masses and orbital inclinations via the detection of mutual perturbations. The analysis of precise radial velocity measurements might reveal these planet-planet interactions and yields a more accurate view of such planetary systems. Like in the generic data modelling problems, a fit to radial velocity data series has a set of unknown parameters of which parametric derivatives have to be known by both the regression methods and the estimations for the uncertainties. In this paper an algorithm is described that aids the computation of such derivatives in case of when planetary perturbations are not neglected. The application of the algorithm is demonstrated on the planetary systems of HD 73526, HD 128311 and HD 155358. In addition to the functions related to radial velocity analysis, the actual implementation of the algorithm contains functions that computes spatial coordinates, velocities and barycent...

  18. Methods and applications of linear models regression and the analysis of variance

    CERN Document Server

    Hocking, Ronald R

    2013-01-01

    Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book

  19. Automatic regression analysis for use in a complex system of evaluation of plant genetic resources

    Directory of Open Access Journals (Sweden)

    Cs. ARKOSSY

    1984-08-01

    Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.

  20. Fast algorithm of the robust Gaussian regression filter for areal surface analysis

    International Nuclear Information System (INIS)

    In this paper, the general model of the Gaussian regression filter for areal surface analysis is explored. The intrinsic relationships between the linear Gaussian filter and the robust filter are addressed. A general mathematical solution for this model is presented. Based on this technique, a fast algorithm is created. Both simulated and practical engineering data (stochastic and structured) have been used in the testing of the fast algorithm. Results show that with the same accuracy, the processing time of the second-order nonlinear regression filters for a dataset of 1024*1024 points has been reduced to several seconds from the several hours of traditional algorithms

  1. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    Science.gov (United States)

    Kristoufek, Ladislav

    2015-02-01

    We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.

  2. QUANTITATIVE STRUCTURE–PROPERTY RELATIONSHIP (QSPR STUDY OF KOVATS RETENTION INDICES OF SOME OF ADAMANTANE DERIVATIVES BYTHE GENETIC ALGORITHM AND MULTIPLE LINEAR REGRESSION (GA-MLR METHOD

    Directory of Open Access Journals (Sweden)

    Z. Bayat

    2011-05-01

    Full Text Available A quantitative structure–property relationship (QSPR study was performed to develop models those relate the structures of 65 Kovats retention index (RI of adamantane derivatives. Molecular descriptors derived solely from 3D structures of the molecular compounds. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 52 molecules as training set, and predictive ability tested using 13 compounds. Modeling of RI of Adamantane derivatives as a function of the theoretically derived descriptors was established by multiple linear regression (MLR. The usefulness of the quantum chemical descriptors, calculated at the level of the DFT theories using 6-311+G** basis set for QSAR study of adamantane derivatives was examined. The use of descriptors calculated only from molecular structure eliminates the need to experimental determination of properties for use in the correlation and allows for the estimation of RI for molecules not yet synthesized. Application of the developed model to testing set of 13 drug organic compounds demonstrates that the model is reliable with goo predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. A multi-parametric equation containing maximum Four descriptors at B3LYP/6-31+G** method with good statistical qualities (R2train=0.913, Ftrain=97.67, R2test=0.770, Ftest=3.21, Q2LOO=0.895, R2adj=0.904, Q2LGO=0.844 was obtained by Multiple Linear Regression using stepwise method.

  3. Key To Effective English Remedial Education: Intimation Derived From Multiple Regression

    Science.gov (United States)

    Zhang, Rong; Ishino, Fukuya

    2009-05-01

    With the rapid decrease in younger population, Japanese universities/colleges have to face the challenging task of how to reach the annual quota for incoming students. The admission criteria are debased and students with a broad variety of scholastic abilities are being accepted by higher education institutions. Freshmen's deterioration in academic performances is said to be the most crucial factor hindering the implementation of effective curriculum education. Many universities/colleges have to establish remedial education programs to deal with this problem arising from the limited room for student selection. This paper reports an English remedial education program carried out in Nishinippon Institute of Technology, Japan, examining the validities of its course setting, optimizing the prediction models for students' post-course score changes. The analysis is focused on those determinants proved to be responsible for the improvement of students' English proficiencies, verifying the argument that more effective English remedial education can be realized by conducting appropriate instructions and teaching methodology in courses at different levels.

  4. Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model

    Indian Academy of Sciences (India)

    Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam

    2013-04-01

    In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (June–September) rainfall were identified from the large scale ocean–atmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 1961–2007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 1977–2007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was ?0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.

  5. Assessment of neural network, frequency ratio and regression models for landslide susceptibility analysis

    Science.gov (United States)

    Pradhan, B.; Buchroithner, M. F.; Mansor, S.

    2009-04-01

    This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.

  6. Formal Specification Language Based IaaS Cloud Workload Regression Analysis

    OpenAIRE

    Singh, Sukhpal; Chana, Inderveer

    2014-01-01

    Cloud Computing is an emerging area for accessing computing resources. In general, Cloud service providers offer services that can be clustered into three categories: SaaS, PaaS and IaaS. This paper discusses the Cloud workload analysis. The efficient Cloud workload resource mapping technique is proposed. This paper aims to provide a means of understanding and investigating IaaS Cloud workloads and the resources. In this paper, regression analysis is used to analyze the Clou...

  7. A meta-regression analysis of benchmarking studies on water utilities market structure

    OpenAIRE

    Carvalho, Pedro; Marques, Rui Cunha; Berg, Sanford

    2011-01-01

    This paper updates the literature on water utility benchmarking studies carried out worldwide, focusing on scale and scope economies. Using meta-regression analysis, the study investigates which variables from published studies influence these economies. Our analysis led to several conclusions. The results indicate that there is a higher probability of finding diseconomies of scale and scope in large utilities; however, only the results for scale economies are significant. Diseconomies of sca...

  8. Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis

    OpenAIRE

    Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A.

    2008-01-01

    There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de’Aubigne score (MDS), Harris hip score and Tonnis angle, ...

  9. Statistical Properties of Multivariate Distance Matrix Regression for High-Dimensional Data Analysis

    OpenAIRE

    Zapala, Matthew A; SCHORK, NICHOLAS J.

    2012-01-01

    Multivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P???N. The technique can be applied to a number of research settings involving high-dimensional data types such as DNA sequence data, gene expression microarray data, and imaging data. MDMR analysis involves computing the distance between all pairs of individuals with respect to P variables of interest and...

  10. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    Science.gov (United States)

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  11. Functional regression analysis using an F test for longitudinal data with large numbers of repeated measures.

    Science.gov (United States)

    Yang, Xiaowei; Shen, Qing; Xu, Hongquan; Shoptaw, Steven

    2007-03-30

    Longitudinal data sets from certain fields of biomedical research often consist of several variables repeatedly measured on each subject yielding a large number of observations. This characteristic complicates the use of traditional longitudinal modelling strategies, which were primarily developed for studies with a relatively small number of repeated measures per subject. An innovative way to model such 'wide' data is to apply functional regression analysis, an emerging statistical approach in which observations of the same subject are viewed as a sample from a functional space. Shen and Faraway introduced an F test for linear models with functional responses. This paper illustrates how to apply this F test and functional regression analysis to the setting of longitudinal data. A smoking cessation study for methadone-maintained tobacco smokers is analysed for demonstration. In estimating the treatment effects, the functional regression analysis provides meaningful clinical interpretations, and the functional F test provides consistent results supported by a mixed-effects linear regression model. A simulation study is also conducted under the condition of the smoking data to investigate the statistical power for the F test, Wilks' likelihood ratio test, and the linear mixed-effects model using AIC. PMID:16817228

  12. Comparison of Neural Networks Prediction and Regression Analysis (MLR and PCR in Modelling Nonlinear System

    Directory of Open Access Journals (Sweden)

    Zainal Ahmad

    2007-10-01

    Full Text Available Different methods for modelling nonlinear system are investigated in this paper. Neural network (NN techniques, multiple linear regression (MLR and principal component regression (PCR are applied to two nonlinear systems which are sine function and distillation column. For the sake of studying these three distinctive methods, all the data taken is from simulation which is then be seperated into training, testing and validation. Among those different approaches, the NN approach based on the nonlinear prediction technique gives a very good performance in for both case studies. It is also shown that MLR model suffers from glitches due to the collinearity of the input variables whereas PCR model shows good result in the prediction output. As a conclusion, the NN methods exhibit a consistent result with least sum square error (SSE on the unseen data compared to the other two technique

  13. Field-scale variation in colloid dispersibility and transport : multiple linear regressions to soil physico-chemical and structural properties

    DEFF Research Database (Denmark)

    NØrgaard, Trine; MØldrup, Per

    2014-01-01

    Colloids are potential carriers for strongly sorbing chemicals in macroporous soils, but predicting the amount of colloids readily available for facilitated chemical transport is an unsolved challenge. This study addresses potential key parameters and predictive indicators when assessing colloid dispersibility and transport at the field scale. Samples representing three measurement scales (1-2 mm aggregates, intact 100 cm3 rings, and intact 6283 cm3 columns) were retrieved from the topsoil of a 1.69 ha agricultural field in a 15 m × 15 m grid (65 locations) to determine soil dispersibility as well as 24 comparison parameters including textural, chemical, and structural (e.g. air permeability) 8 soil properties. The soil dispersibility was determined (i) using a laser diffraction method on 1-2 mm aggregates equilibrated to an initial matric potential of -100 cm H2O, (ii) using an end-over-end shaking on 6.06 cm (diam.) × 3.48 cm (height) cm intact soil rings equilibrated to an initial matric potential of -5 cmH2O, and (iii) as the accumulated amount of particles leached from 20 cm × 20 cm intact soil columns after 6.5 hr (60 mm accumulated outflow). At all three scales, soil dispersibility was higher in samples collected from the northern part of the field where the greatest leaching of pesticides was observed in a horizontal well at ~ 3.5 m depth during a 9-year monitoring program. This suggests that the three dispersibility methods used are all relevant for field-scale mapping of areas with enhanced risk of colloid-facilitated transport. Subsequently, using multiple linear regression (MLR) analyses, soil dispersibility was predicted at all three sample scales from the 24 measured, geo-referenced parameters to produce sets of only a few promising indicator parameters for evaluating soil stability and particle mobilization on field scale. The MLR analyses at each scale were separated in predictions using all, only north, and only south locations in the field. We found that different independent variables were included in the regression models when the sample scale increased from aggregate to column level. Generally, the predictive power of the regression models was better on the 1-2 mm aggregate scale than on the intact 100 cm3 and 20 cm × 20 cm scales. Overall, results suggested that different drivers controlled soil dispersibility 1 at the three scales and the two sub-areas of the field. Predictions of soil dispersibility and the risk of colloid-facilitated chemical transport will therefore need to be highly scale- and area-specific.

  14. A Menu-Driven Software Package of Bayesian Nonparametric (and Parametric) Mixed Models for Regression Analysis and Density Estimation

    OpenAIRE

    Karabatsos, George

    2015-01-01

    Most of applied statistics involves regression analysis of data. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal l...

  15. Land use regression modeling of intra-urban residential variability in multiple traffic-related air pollutants

    Directory of Open Access Journals (Sweden)

    Baxter Lisa K

    2008-05-01

    Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns within urban neighborhoods, and were differently related to local traffic and meteorology. Our results indicate a need for multi-pollutant exposure modeling to disentangle causal agents in epidemiological studies, and further investigation of site-specific and meteorological modification of the traffic-concentration relationship in urban neighborhoods.

  16. Multi-task Regression using Minimal Penalties

    OpenAIRE

    Solnon, Matthieu; Arlot, Sylvain; Bach, Francis

    2012-01-01

    In this paper we study the kernel multiple ridge regression framework, which we refer to as multi-task regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression f...

  17. MRI Texture Analysis in Multiple Sclerosis

    OpenAIRE

    Yunyan Zhang

    2011-01-01

    Multiple sclerosis (MS) is a complicated disease characterized by heterogeneous pathology that varies across individuals. Accurate identification and quantification of pathological changes may facilitate a better understanding of disease pathogenesis and progression and help identify novel therapies for MS patients. Texture analysis evaluates interpixel relationships that generate characteristic organizational patterns in an image, many of which are beyond the ability of visual perception. Gi...

  18. Regression Analysis of Effective Factor on People Participation in Protecting, Revitalizing, Developing and Using Renewable Natural Resources in Ilam Province from the View of Users

    OpenAIRE

    Bagher Arayesh; Sayed J. Hosseini

    2010-01-01

    Problem statement: The purpose of this study was the regression analysis of effective factor on people participation in protecting, revitalizing, developing and using renewable natural resources in Ilam province. Approach: This study was a casual comparative and applies one. Sample was taken from natural resources users. Results: The sample size of groups was 317 for users respectively. For sample selection, stratified, cluster and multiple sampling were utilized. The main tools for gathering...

  19. Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.

    Science.gov (United States)

    Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J

    2015-06-01

    Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed between predicted and observed values of ADG and G:F except for the low-energy diet containing the greatest fiber content (30% DDGS diet), where ADG and G:F were overpredicted by 3 to 6%. Therefore, the prediction equations provided a good estimation of the growth rate and feed efficiency of growing-finishing pigs fed different levels of dietary NE except for the pigs fed the low-energy diet containing the greatest fiber content. PMID:26115270

  20. Application of Binary Regression Analysis in the Prescription Pattern of Antidepressants

    Directory of Open Access Journals (Sweden)

    Dr.Indrajit Banerjee, MBBS, MD

    2013-05-01

    Full Text Available Background:In Nepal several research studies are reported using percentages or cross tabulation method, but the relevance of logistic regression methodology in research is lag behind among the researchers. Objectives: The main objective of this study was to find the role of logistic regression analysis in the pattern of antidepressants in a tertiary care center in hospitalized patients of Western Nepal.Methods: A hospital based study was done between 1st October 2009 and 31st March 2010 at Psychiatry Ward of Manipal Teaching Hospital, Nepal. Z test, Chi square test and Binary logistic regression were used for the analysis. We calculated odds ratios (OR and their 95% confidence intervals (95% CI P-value 10000, 2.63 times more in Hindus and 1.197 times more in Brahmins than any other ethnic groups. 9.179 times more tendency of prescribing antidepressants by trade names in case of unemployed patients as compared to employed patients in Nepal.Conclusion: Binary Logistic regression plays an important role to understand the drug utilization pattern of mood elevators in Western Nepal.

  1. Linear regression analysis of the gamma dose in fast neutron beams

    International Nuclear Information System (INIS)

    The dual dosimeter technique for determining both the absorbed dose of neutrons and photons in a mixed field has been applied to multiple dosimeter use. The data were analyzed by a linear regression method which yields the neutron dose from the slope and the photon dose from the intercept and an estimation of the uncertainty of the photon dose can also be obtained. Measurements were made on a high energy neutron beam and the photon dose obtained both as a function of field size and depth in a tissue equivalent phantom

  2. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    International Nuclear Information System (INIS)

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

  3. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    Science.gov (United States)

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  4. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  5. A Parallel Implementation of the Network Identification by Multiple Regression (NIR) Algorithm to Reverse-Engineer Regulatory Gene Networks

    Science.gov (United States)

    Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

    2010-01-01

    The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes - as is the case in biological networks - due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications. PMID:20422008

  6. An application with multinomial logistic regression analysisMultinomiyal logistik regresyon analizi ile bir uygulama

    Directory of Open Access Journals (Sweden)

    Sadi Elasan

    2015-01-01

    Full Text Available Multinomial logistic regression analysis is one of the analysis techniques which is used to examine relationships between independent and dependent variables when dependent variable including three or more category. In multinomial logistic regression analysis, any category of dependent variable is considered as reference category and other categories are analyzed with respect to this category. In this study “Multinomial Logistic Regression Analysis” was introduced and an application was done. In the application trauma variable was considered as 4 categories [no abused (0, sexual abused (1, physical abused (2, sexual and physical abused (3] and effects of other variables on trauma were examined. As a result, it can be noted that multinomial logistic regression analysis is applicable for response variable contains 3 or more categories. ÖzetMultinomiyal logistik regresyon analizi, cevap de?i?keninin üç veya daha fazla kategori içerdi?i durumlarda; bu de?i?ken ile aç?klay?c? de?i?kenler (ba??ms?z de?i?kenler aras?ndaki ili?kiyi belirlemede kullan?lan yöntemlerden birisidir. Multinomiyal logistik regresyon analizinde; cevap de?i?keninin herhangi bir kategorisi referans kategori olarak al?n?r ve di?er kategoriler bu referans kategoriye göre analiz edilir. Bu çal??mada, “Multinomiyal Logistik Regresyon Analizi” tan?t?lm?? ve bir uygulama yap?lm??t?r. Uygulamada, travma de?i?keni, [Travma yok (0, Cinsel travma (1, Fiziksel travma (2, Cinsel ve Fiziksel travma (3] 4 kategorili olarak kodlanm?? ve bu de?i?ken üzerine di?er de?i?kenlerin etkisi incelenmi?tir. Sonuçta cevap de?i?keninin 3 ve daha fazla kategori içerdi?i durumlarda Multinomiyal Logistik Regresyon Analizi yönteminin kullan?labilirli?ine dikkat çekilmi?tir. 

  7. Transferencia de información hidrológica mendiante regresión lineal múltiple, con selección óptima de regresores / Transference of hydrologic information through multiple linear regression, with best predictor variables selection

    Scientific Electronic Library Online (English)

    Daniel F., Campos-Aranda.

    2011-12-01

    Full Text Available Es necesario contar con registros largos de información hidrológica anual para obtener una imagen más apegada a la realidad de su variabilidad, así como estimaciones confiables de sus propiedades estadísticas. Para obtener tales registros es común buscar fuentes adicionales de datos y técnicas de tr [...] ansferencia. Una técnica es la regresión lineal múltiple, cuya aplicación numérica lleva implícita la selección óptima de los registros largos cercanos (regresores) para buscar que la ampliación del registro corto sea una estimación confiable. Este proceso de selección implica tres análisis: 1) cómo definir las mejores estimaciones, 2) cuáles ecuaciones de regresión investigar, y 3) cuál modelo tiene mejor capacidad predictiva. Para el primer análisis se presentan cuatro criterios basados en las sumas de los cuadrados de los residuos; para el segundo se investigan todas las regresiones posibles porque en los problemas de transferencia de información hidrológica se dispondrá máximo de cinco regresores; para el tercero, seleccionar el mejor modelo predictivo se utiliza el análisis de residuales y la validación cruzada. La aplicación numérica descrita es una ampliación del registro de volúmenes escurridos anuales en la estación hidrométrica Platón Sánchez del sistema del río Tempoal, en la Región Hidrológica No. 26 (Pánuco, México). En este caso se utilizan cuatro regresores que son los registros del resto de las estaciones de aforos de tal sistema. Se concluye que incluso en problemas con multicolinealidad, los criterios de selección y los análisis expuestos conducen a resultados consistentes y permiten obtener las mejores ecuaciones de regresión. La similitud de los resultados alcanzados con los modelos de regresión seleccionados genera confianza en las estimaciones adoptadas. Abstract in english It is necessary to have long records of annual hydrological data to get a truer picture of their variability, as well as reliable estimates of their statistical properties. To obtain these records it is common to use additional sources of data and transfer techniques. One technique is the multiple l [...] inear regression whose numerical application implies the optimum selection of close lengthy records (regressors) to have the extension of short registration be a reliable estimate. This selection process involves three analyses: 1) how to define the best estimates, 2) what regression equations should be investigated, and 3) which model has better predictive ability. For the first analysis four criteria based on the sums of the squares of the residuals are presented; for the second all possible regressions are investigated since in the problems of hydrological information transfer, we will have five regressors at the most; for the third, about selecting the best predictive model, we used the residual analysis and cross-validation. The numerical application described is an extension of the annual runoff volume record in the Platón Sánchez hydrometric station of the Tempoal river system in the 26 Hydrological Region (Pánuco, México). Here we used four regressors that are the records of other gauging stations in such system. We came to the conclusion that even in problems with multicollinearity, the selection criteria and analysis led to consistent results and allowed for the best regression equations. The similarity of the results obtained with the selected regression models generated confidence in the estimates adopted.

  8. Classificação da composição iônica da água de irrigação usando regressão linear múltipla / Classification of the ionic composition of the irrigation water using multiple linear regression

    Scientific Electronic Library Online (English)

    Celsemy E., Maia; Elís R.C. de, Morais; Maurício de, Oliveira.

    2001-04-01

    Full Text Available Objetivou-se, com o presente trabalho, desenvolver uma metodologia para classificação da composição iônica da água de irrigação, através da regressão linear múltipla, tendo-se, como variável dependente, a condutividade elétrica e, como variáveis independentes, as concentrações de cátions e ânions da [...] água de irrigação, classificada de acordo com o peso de cada íon no modelo estatístico. A fonte secundária de dados para a pesquisa foi o Banco de Dados do Laboratório de Análise de Água e Fertilidade do Solo, da Escola Superior de Agricultura de Mossoró (LAAFS/ESAM). As regressões foram ajustadas utilizando-se o método da seleção por etapas, conhecido como the stepwise regression procedure, no qual a variável dependente foi a condutividade elétrica e, como variáveis independentes, os íons determinados pela análise físico-química da água. Os resultados mostraram que, empregando-se este critério de regressão linear múltipla, havia variação na contribuição de cada variável no modelo ajustado, cuja estimativa era baseada no aumento da soma de quadrado, devido à regressão, a medida em que se incorporava, ao modelo, cada variável independente. Em função de critérios preestabelecidos, águas provenientes de mananciais da região da Chapada do Apodi foram classificadas como cálcica-sódica, cálcica e cloretada, quando provinham de poço tubular, de poço amazonas e rio, respectivamente. As águas oriundas da região do Baixo Açu, foram classificadas como sódica, magnesiana-sódica e sódica, para as águas de poço tubular, poço amazonas e rio, respectivamente. Abstract in english This work was conducted with the objective of developing a methodology for classification of the ionic composition of the irrigation water using multiple linear regression. A Stepwise Regression Analysis model was tested, using electrical conductivity as the dependent variable and analyzed ions calc [...] ium, sodium, potassium, carbonate, bicarbonate and chlorides as the independent variables in all tested models. All water samples were collected by the farmers of the region where this work was conducted. The regression models were adjusted using the water analysis database from the ESAM's Analysis Laboratory (Laboratório de Análises de Água e Fertilidade do Solo da Escola Superior de Agricultura de Mossoró - LAAFS/ESAM). The linear model, adjusted using the Stepwise Regression Procedure, shows that the degree of model adjustment tested depends upon geological formation of watersheds and whether it is collected in a river or tubular wells. The classification of the water in calcareous region of the Chapada do Apodi is calcic-sodic, calcic or choride if this source was tubular well, piezometric well (drilled in unconfined water denominated in the region as poço amazonas) or surface rivers and lagoons water, respectively. In Baixo Açu region, these waters were classified as sodic, magnesian-sodic or sodic depending if the source collected is a tubular well (drilled in Açu sedimentary geological formation), piezometric well or superficial water, respectivelly.

  9. Robust Outlier Detection in Linear Regression

    OpenAIRE

    Jajo, Nethal K.; Xizhi Wu

    2004-01-01

    New methodology of robust outlier detection based on Robustly Studentized Robust Residuals (RSRR) examination is well established in linear regression analysis. Two new robust location estimators of linear regression parameters are developed in simple and multiple cases. Based on these robust estimators we obtain RSRR. We used RSRR to derive a new measure of distance to be used in outlier detection. A graphical display using new measure of distance is constructed for detecting multiple outlie...

  10. Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions

    OpenAIRE

    Catalin Angelo Ioan; Gina Ioan

    2011-01-01

    In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean squa...

  11. Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions

    Directory of Open Access Journals (Sweden)

    Catalin Angelo Ioan

    2011-08-01

    Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.

  12. An application of principal component analysis and logistic regression to facilitate production scheduling decision support system: an automotive industry case

    Science.gov (United States)

    Mehrjoo, Saeed; Bashiri, Mahdi

    2013-05-01

    Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be inefficient because of daily fluctuations in real factories. Decision support systems can provide productive tools for production planners to offer a feasible and prompt decision in effective and robust production planning. In this paper, we propose a robust decision support tool for detailed production planning based on statistical multivariate method including principal component analysis and logistic regression. The proposed approach has been used in a real case in Iranian automotive industry. In the presence of existing multisource uncertainties, the results of applying the proposed method in the selected case show that the accuracy of daily production planning increases in comparison with the existing method.

  13. PERFORMANCE OF RIDGE REGRESSION ESTIMATOR METHODS ON SMALL SAMPLE SIZE BY VARYING CORRELATION COEFFICIENTS: A SIMULATION STUDY

    OpenAIRE

    Anwar Fitrianto; Lee Ceng Yik

    2014-01-01

    When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS) method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performan...

  14. High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

    OpenAIRE

    Daye, Z. John; CHEN, JINBO; Li, Hongzhe

    2011-01-01

    We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a nov...

  15. Ultrametric Wavelet Regression of Multivariate Time Series: Application to Colombian Conflict Analysis

    OpenAIRE

    Murtagh, Fionn; Spagat, Michael; A. Restrepo, Jorge

    2011-01-01

    We first pursue the study of how hierarchy provides a well-adapted tool for the analysis of change. Then, using a time sequence-constrained hierarchical clustering, we develop the practical aspects of a new approach to wavelet regression. This provides a new way to link hierarchical relationships in a multivariate time series data set with external signals. Violence data from the Colombian conflict in the years 1990 to 2004 is used throughout. We conclude with some proposals...

  16. Personality disorders, violence, and antisocial behavior: a systematic review and meta-regression analysis.

    OpenAIRE

    Yu, R.; Geddes, JR; Fazel, S

    2012-01-01

    The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and ...

  17. The effects of exchange rate variability on international trade: a Meta-Regression Analysis

    OpenAIRE

    ??ori??, Bruno; Pugh, Geoffrey Thomas

    2008-01-01

    Abstract The trade effects of exchange rate variability have been an issue in international economics for the past 30 years. The contribution of this paper is to apply meta-regression analysis (MRA) to the empirical literature. On average, exchange rate variability exerts a negative effect on international trade. Yet MRA confirms the view that this result is highly conditional, by identifying factors that help to explain why estimated trade effects vary from significantly negative ...

  18. Meteorological elements in the growth and development of selected cereal crop - applicability of regression analysis method.

    Czech Academy of Sciences Publication Activity Database

    Trnka, M.; Žalud, Z.; Semerádová, Daniela; Dubrovský, Martin

    Brno : ?eská bioklimatologická spole?nost, 2002 - (Rožnovský, J.; Litschmann, T.), s. - ISBN 80-85813-99-8. [14. ?eská-Slovenská bioklimatologická konference. Lednice na Morav? (CZ), 02.09.2002-04.09.2002] R&D Projects: GA ?R GA521/02/0827 Institutional research plan: CEZ:AV0Z3042911 Keywords : regression analysis * spring barley Subject RIV: DG - Athmosphere Sciences, Meteorology

  19. Automatic regression analysis for use in a complex system of evaluation of plant genetic resources

    OpenAIRE

    Attila T. SZABO; Cs. ARKOSSY

    1984-01-01

    In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1.) data control and error listing; (2) computation of the regression function; (3) listing of the difference between the values measured and computed; (4) sorting o...

  20. Real Estate and the Stock Market: A Meta?Regression Analysis

    OpenAIRE

    GURDGIEV, CONSTANTIN; LUCEY, BRIAN MICHAEL

    2011-01-01

    The real estate finance literature provides diverse and contradictory findings regarding the relationship between the real estate market and the stock market. Despite the importance of this relationship to the economy in general relatively little is known of what causes such differences. In this paper, through applying the technique of meta?regression analysis to the empirical studies in the area a significant step is made towards objectively integrating and synthesising the results and ident...

  1. Mixed-effects Poisson regression analysis of adverse event reports: The relationship between antidepressants and suicide

    OpenAIRE

    GIBBONS, ROBERT D.; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K.; Dulal K. Bhaumik; Brown, C Hendricks; Kapur, Kush; Marcus, Sue M; Hur, Kwan; Mann, J. John

    2008-01-01

    A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Adminis...

  2. Clinical and multiple gene expression variables in survival analysis of breast cancer: Analysis with the hypertabastic survival model

    OpenAIRE

    Tabatabai Mohammad A; Eby Wayne M; Nimeh Nadim; Li Hong; Singh Karan P

    2012-01-01

    Abstract Background We explore the benefits of applying a new proportional hazard model to analyze survival of breast cancer patients. As a parametric model, the hypertabastic survival model offers a closer fit to experimental data than Cox regression, and furthermore provides explicit survival and hazard functions which can be used as additional tools in the survival analysis. In addition, one of our main concerns is utilization of multiple gene expression variables. Our analysis treats the ...

  3. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum lik...

  4. Prediction of large esophageal varices in cirrhotic patients using classification and regression tree analysis

    Scientific Electronic Library Online (English)

    Wan-dong, Hong; Le-mei, Dong; Zen-cai, Jiang; Qi-huai, Zhu; Shu-Qing, Jin.

    Full Text Available OBJECTIVES: Recent guidelines recommend that all cirrhotic patients should undergo endoscopic screening for esophageal varices. That identifying cirrhotic patients with esophageal varices by noninvasive predictors would allow for the restriction of the performance of endoscopy to patients with a hig [...] h risk of having varices. This study aimed to develop a decision model based on classification and regression tree analysis for the prediction of large esophageal varices in cirrhotic patients. METHODS: 309 cirrhotic patients (training sample, 187 patients; test sample 122 patients) were included. Within the training sample, the classification and regression tree analysis was used to identify predictors and prediction model of large esophageal varices. The prediction model was then further evaluated in the test sample and different Child-Pugh classes. RESULTS: The prevalence of large esophageal varices in cirrhotic patients was 50.8%. A tree model that was consisted of spleen width, portal vein diameter and prothrombin time was developed by classification and regression tree analysis achieved a diagnostic accuracy of 84% for prediction of large esophageal varices. When reconstructed into two groups, the rate of varices was 83.2% for high-risk group and 15.2% for low-risk group. Accuracy of the tree model was maintained in the test sample and different Child-Pugh classes. CONCLUSIONS: A decision tree model that consists of spleen width, portal vein diameter and prothrombin time may be useful for prediction of large esophageal varices in cirrhotic patients

  5. Canopy Height Estimation in French Guiana with LiDAR ICESat/GLAS Data Using Principal Component Analysis and Random Forest Regressions

    Directory of Open Access Journals (Sweden)

    Ibrahim Fayad

    2014-11-01

    Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform extent information is an interesting alternative for canopy height estimation as it does not require several metrics that are difficult to derive from GLAS waveforms in dense forests, such as those in French Guiana.

  6. Multiple relational analysis method for uranium mineral metallogenic prediction

    International Nuclear Information System (INIS)

    After introduction of the basic principle of relational analysis, multiple relational analysis method for uranium mineral resources are proposed. Multiple relational analysis prediction method is especially efficient where known ore deposits or ore-bearing units are scarce. Where other prediction methods fail, multiple relational analysis method proves to work well with reliability and accuracy. It is fully illustrated with the examples presented

  7. Transferencia de información de crecientes mediante regresión lineal múltiple / Transfer of flood information through multiple linear regression

    Scientific Electronic Library Online (English)

    Daniel Francisco, Campos-Aranda.

    2011-09-01

    Full Text Available Los registros de gastos máximos anuales (crecientes) permiten el dimensionamiento hidrológico de las obras hidráulicas de protección y de cruce. Lógicamente, entre mayores sean las series disponibles, más confiables serán sus estimaciones probabilísticas o predicciones. Por tal razón, en este trabaj [...] o se describe con detalle la técnica de transferencia de información de crecientes, mediante regresión lineal múltiple, para ampliar registros cortos con base en las series largas cercanas, revisando la conveniencia estadística de dicha transferencia. Se expone la formulación matemática de manera simple, al utilizar la solución matricial y se desarrolla un ejemplo numérico para ampliar la serie de la estación hidrométrica Platón Sánchez del río Tempoal en el estado de Veracruz, México, utilizando diversos registros cercanos. Por último, se formulan las conclusiones, las cuales destacan la sencillez del procedimiento y sugieren su aplicación sistemática. Abstract in english Maximum annual flow records (floods) are used for hydrologic dimensionality of hydraulic structures designed for protection and crossing. Logically, the longer the available series, the more reliable their probabilistic estimates or predictions. Thus, this work describes in detail the procedure for [...] flood information transfer through multiple linear regression. Short records are enlarged based on longer nearby series, examining the statistical advantage of said transfer. The mathematical formulation is presented in a simple way using a matrix solution and a numerical example is developed to enlarge the series at the Platón Sánchez hydrometric station on the Tempoal River in the state of Veracruz, using several nearby records. Lastly, the conclusions are formulated, which point out the simplicity of the procedure and suggest its systematic application.

  8. Multiple linear regression to develop strength scaled equations for knee and elbow joints based on age, gender and segment mass

    DEFF Research Database (Denmark)

    D'Souza, Sonia; Rasmussen, John

    2012-01-01

    Background: The next fifty years will see a drastic increase in the older population. Among other effects, ageing causes a decrease in strength. It is necessary to provide safe and comfortable environments for the elderly. To achieve this, digital human modelling has proved to be a useful and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent parameters based on statistical redundancies, and then validate these equations. Methods: 283 subjects (141 males, 142 females) aged 50-59 years (54.9 +/- 2.9) , 60-69 years (65.4 +/- 2.9) and 70-79 years (73.7 +/- 2.7) were tested for maximal voluntary isometric torque of right knee extensors and elbow flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (P<0.05) across all age groups. This held true for males and females. Gender, thigh mass and age best predicted KPT (R2=0.60). Gender, forearm mass and age best predicted EPT (R2=0.75). Good crossvalidation was established for both elbow and knee models. Conclusion: This cross-sectional study of muscle strength created and validated strength scaled equations of EPT and KPT using only gender, segment mass and age.

  9. Multiple regression models of ?13C and ?15N for fish populations in the eastern Gulf of Mexico

    Science.gov (United States)

    Radabaugh, Kara R.; Peebles, Ernst B.

    2014-08-01

    Multiple regression models were created to explain spatial and temporal variation in the ?13C and ?15N values of fish populations on the West Florida Shelf (eastern Gulf of Mexico, USA). Extensive trawl surveys from three time periods were used to acquire muscle samples from seven groundfish species. Isotopic variation (?13Cvar and ?15Nvar) was calculated as the deviation from the isotopic mean of each fish species. Static spatial data and dynamic water quality parameters were used to create models predicting ?13Cvar and ?15Nvar in three fish species that were caught in the summers of 2009 and 2010. Additional data sets were then used to determine the accuracy of the models for predicting isotopic variation (1) in a different time period (fall 2010) and (2) among four entirely different fish species that were collected during summer 2009. The ?15Nvar model was relatively stable and could be applied to different time periods and species with similar accuracy (mean absolute errors 0.31-0.33‰). The ?13Cvar model had a lower predictive capability and mean absolute errors ranged from 0.42 to 0.48‰. ?15N trends are likely linked to gradients in nitrogen fixation and Mississippi River influence on the West Florida Shelf, while ?13C trends may be linked to changes in algal species, photosynthetic fractionation, and abundance of benthic vs. planktonic basal resources. These models of isotopic variability may be useful for future stable isotope investigations of trophic level, basal resource use, and animal migration on the West Florida Shelf.

  10. Regression analysis with missing data and unknown colored noise: Application to the MICROSCOPE space mission

    Science.gov (United States)

    Baghi, Quentin; Métris, Gilles; Bergé, Joël; Christophe, Bruno; Touboul, Pierre; Rodrigues, Manuel

    2015-03-01

    The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events, or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method that cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whose goal is to test the weak equivalence principle (WEP) with a precision of 1 0-15. In this particular context the signal of interest is the WEP violation signal expected to be found around a well defined frequency. We test our method with different gap patterns and noise of known PSD and find that the results agree with the mission requirements, decreasing the uncertainty by a factor of 60 with respect to ordinary least squares methods. We show that it also provides a test of significance to assess the uncertainty of the measurement.

  11. An Econometric Analysis of Modulated Realised Covariance, Regression and Correlation in Noisy Diffusion Models

    DEFF Research Database (Denmark)

    Kinnebrock, Silja; Podolskij, Mark

    2008-01-01

    This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise process can be relaxed and how our method can be applied to non-synchronous observations. We also present an empirical study of how high-frequency correlations, regressions and covariances change through time.

  12. Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis

    Directory of Open Access Journals (Sweden)

    Carlos Augusto Zangrando Toneli

    2011-09-01

    Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.

  13. Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression

    Science.gov (United States)

    Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.

    2015-03-01

    Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.

  14. Microcomputer application of non-linear regression analysis to metal-ligand equilibria.

    Science.gov (United States)

    Taylor, P D; Morrison, I E; Hider, R C

    1988-07-01

    A non-linear least-squares regression program is described which is suitable for PC-compatible microcomputers. The program is written in GWBASIC, but compiled to run with the Intel 8087 fast numeric processor. Subroutines which simulate functions are compiled separately from the main program. Parameters are optimized by a Gauss-Newton-Marquardt algorithm which can be provided with either analytically or numerically calculated partial derivatives. Multi-component potentiometric titrations are simulated and parameters optimized by using analytical derivatives. Spectrophotometric titrations are also simulated, but absorptivities are optimized by linear regression while stability constants are optimized non-linearly by using numerical derivatives. Provision is made for "global analysis" of parameters. The experimental points can be displayed on screen, along with the "best" fit and the speciation. The program is demonstrated here by the determination of the pK(a) values and stability constants of a hydroxypyridinone ligand and its complexes with Fe(III). PMID:18964564

  15. Diversity Performance Analysis on Multiple HAP Networks.

    Science.gov (United States)

    Dong, Feihong; Li, Min; Gong, Xiangwu; Li, Hongjun; Gao, Fengyue

    2015-01-01

    One of the main design challenges in wireless sensor networks (WSNs) is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP) is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO) techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO) model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV). In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF) and cumulative distribution function (CDF) of the received signal-to-noise ratio (SNR) are derived. In addition, the average symbol error rate (ASER) with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI) and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques. PMID:26134102

  16. Diversity Performance Analysis on Multiple HAP Networks

    Directory of Open Access Journals (Sweden)

    Feihong Dong

    2015-06-01

    Full Text Available One of the main design challenges in wireless sensor networks (WSNs is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV. In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF and cumulative distribution function (CDF of the received signal-to-noise ratio (SNR are derived. In addition, the average symbol error rate (ASER with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques.

  17. Global sensitivity analysis of transmission line fault-locating algorithms using sparse grid regression

    International Nuclear Information System (INIS)

    Computation of distance to fault on an electrical transmission line is affected by many sources of uncertainty, including parameter setting errors, measurement errors, as well as absence of information and incomplete modelling of a system under fault condition. In this paper we propose an application of the variance-based global sensitivity measures for evaluation of fault location algorithms. The main goal of the evaluation is to identify factors and their interactions that contribute to the fault locator output variability. This analysis is based on the results of Sparse Grid Regression. The method compiles the Functional ANOVA model to represent fault locator output as a function of uncertain factors. The ANOVA model provides a tool for interpretation and sensitivity analysis. In practice, such analysis can help in functional performance tests, especially in: selection of the optimal fault location algorithm (device) for a specific application, calibration process and building confidence in a fault location function result. The paper concludes with an application example which demonstrates use of the proposed methodology in testing and comparing some commonly used fault location algorithms. This example is also used to demonstrate numerical efficiency for this type of application of the proposed Sparse Grid Regression method in comparison to the Quasi-Monte Carlo approach. - Highlights: ? Sparse Grid Regression (SGR) method has been developed and presented in the paper. ? The SGR method is able to fit ANOVA model to input/output data of a black-box function. ? The SGR provides variance-based sensitivities to be used for Global Sensitivity Analysis (GSA). ? The SGR algorithm relies on the numerical multi-dimensional integration on a sparse grid. ? Application example presented is GSA of fault-locating algorithms used in electrical networks.

  18. Quantitative structure-property relationship study of n-octanol-water partition coefficients of some of diverse drugs using multiple linear regression

    International Nuclear Information System (INIS)

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log Po/w). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log Po/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log Po/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R2) for MLR model were 0.22 and 0.99 for the prediction set log Po/w

  19. Analysis of designed experiments by stabilised PLS Regression and jack-knifing

    DEFF Research Database (Denmark)

    Martens, Harald; HØy, M.

    2001-01-01

    Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range of applicability to the analysis of effects in designed experiments. Two ways of passifying unreliable variables are shown. A method for estimating the reliability of the cross- validated prediction error RMSEP is demonstrated. Some recently developed jack-knifing extensions are illustrated, for estimating the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi-response data. The study is part of an ongoing effort to establish a cognitively simple and versatile approach to multivariate data analysis, with reliability assessment based on the data at hand, and with little need for abstract distribution theory [H. Martens, M. Martens, Multivariate Analysis of Quality. An Introduction, Wiley, Chichester, UK, 2001].

  20. Transición de un modelo de regresión lineal múltiple predictivo, a un modelo de regresión no lineal simple explicativo con mejor nivel de predicción: Un enfoque de dinámica de sistemas / Transition from a predictive multiple linear regression model to an explanatory simple nonlinear regression model with higher level of prediction: A systems dynamics approach

    Scientific Electronic Library Online (English)

    Roberto, Baeza-Serrato; José Antonio, Vázquez-López.

    2014-06-01

    Full Text Available Uno de los supuestos principales del análisis de regresión lineal es la existencia de una relación de causalidad entre las variables analizadas, sin que el análisis de regresión lo permita demostrar. Esta investigación demuestra la causalidad entre las variables analizadas a través de la construcció [...] n y análisis de la retroalimentación entre las variables en estudio, plasmada en un diagrama causal y validado a través de simulación dinámica. Una de las principales contribuciones de ésta investigación, es la propuesta de utilizar un enfoque de dinámica de sistemas, para desarrollar un método de transición de un modelo de regresión lineal múltiple predictivo a un modelo de regresión no lineal simple explicativo, que incrementa el nivel de predicción del modelo. El error cuadrático medio (ECM) es utilizado como criterio de predicción. La validación se realizó con tres modelos de regresión lineal obtenidos experimentalmente en una empresa del sector textil, mostrando una alternativa para incrementar la fiabilidad en los modelos de predicción. Abstract in english One of the main assumptions of the linear regression analysis is the existence of a causal relationship between the variables analyzed, which the regression analysis does not demonstrate. This paper demonstrates the causality between the variables analyzed through the construction and analysis of th [...] e feedback from the variables under study, expressed in a causal diagram and validated through dynamic simulation. The major contribution of this research is the proposal of the use of the system dynamics approach to develop a method of transition from a multiple regression predictive model to a simpler nonlinear regression explanatory model, which increases the level of prediction of the model. The mean square error (MSE) is taken as a criterion for prediction. The validation in the transition model was performed with three linear regression models obtained experimentally in a textile company, showing a method for increasing the reliability of prediction models.

  1. Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17

    Science.gov (United States)

    2011-01-01

    The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved. PMID:22373385

  2. Guide to using Multiple Regression in Excel (MRCX v.1.1) for Removal of River Stage Effects from Well Water Levels

    Energy Technology Data Exchange (ETDEWEB)

    Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.

    2010-09-01

    A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.

  3. Analysis of geographical disparities in temporal trends of health outcomes using space-time joinpoint regression

    Science.gov (United States)

    Goovaerts, Pierre

    2013-06-01

    Analyzing temporal trends in health outcomes can provide a more comprehensive picture of the burden of a disease like cancer and generate new insights about the impact of various interventions. In the United States such an analysis is increasingly conducted using joinpoint regression outside a spatial framework, which overlooks the existence of significant variation among U.S. counties and states with regard to the incidence of cancer. This paper presents several innovative ways to account for space in joinpoint regression: (1) prior filtering of noise in the data by binomial kriging and use of the kriging variance as measure of reliability in weighted least-square regression, (2) detection of significant boundaries between adjacent counties based on tests of parallelism of time trends and confidence intervals of annual percent change of rates, and (3) creation of spatially compact groups of counties with similar temporal trends through the application of hierarchical cluster analysis to the results of boundary analysis. The approach is illustrated using time series of proportions of prostate cancer late-stage cases diagnosed yearly in every county of Florida since 1980s. The annual percent change (APC) in late-stage diagnosis and the onset years for significant declines vary greatly across Florida. Most counties with non-significant average APC are located in the north-western part of Florida, known as the Panhandle, which is more rural than other parts of Florida. The number of significant boundaries peaked in the early 1990s when prostate-specific antigen (PSA) test became widely available, a temporal trend that suggests the existence of geographical disparities in the implementation and/or impact of the new screening procedure, in particular as it began available.

  4. New introduction to multiple time series analysis

    CERN Document Server

    Lütkepohl, Helmut

    2005-01-01

    When I worked on my Introduction to Multiple Time Series Analysis (Lutk ¨ ¨- pohl (1991)), a suitable textbook for this ?eld was not available. Given the great importance these methods have gained in applied econometric work, it is perhaps not surprising in retrospect that the book was quite successful. Now, almost one and a half decades later the ?eld has undergone substantial development and, therefore, the book does not cover all topics of my own courses on the subject anymore. Therefore, I started to think about a serious revision of the book when I moved to the European University Institu

  5. Explaining temporal trends in annualized relapse rates in placebo groups of randomized controlled trials in relapsing multiple sclerosis: systematic review and meta-regression

    OpenAIRE

    Steinvorth, Simon M.; Röver, Christian; Schneider, Simon; Nicholas, Richard; Straube, Sebastian; Friede, Tim

    2013-01-01

    Background: Recent studies have shown a decrease in annualised relapse rates (ARRs) in placebo groups of randomised controlled trials (RCTs) in relapsing multiple sclerosis (RMS). Methods: We conducted a systematic literature search of RCTs in RMS. Data on eligibility criteria and baseline characteristics were extracted and tested for significant trends over time. A meta-regression was conducted to estimate their contribution to the decrease of trial ARRs over time. R...

  6. FREQFIT: Computer program which performs numerical regression and statistical chi-squared goodness of fit analysis

    International Nuclear Information System (INIS)

    The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program's results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig

  7. KINETIC ANALYSIS OF HIGH-NITROGEN ENERGETIC MATERIALS USING MULTIVARIATE NONLINEAR REGRESSION

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, M. S. (Mary Stinecipher); Rabie, R. L. (Ronald L.); Diaz-Acosta, I. (Irina); Pulay, P. (Peter)

    2001-01-01

    New high-nitrogen energetic materials were synthesized by Hiskey and Naud. J. Opfermann reported a new tool for finding the probable model of the complex reactions using multivariate non-linear regression analysis of DSC and TGA data from several measurements run at different heating rates. This study is to take the kinetic parameters from the different steps and discover which reaction step is responsible for the runaway reaction by comparing predicted results from the Frank-Kamenetsckii equation with the critical temperature found experimentally using the modified Henkin test.

  8. Analysis of reactor noise by multi-variate auto-regressive model

    International Nuclear Information System (INIS)

    The multi-variate auto-regressive model has recently been applied to the noise analysis of nuclear reactor systems. From such a standpoint a system identification study was performed at the Japan Power Demonstrain Reactor-2 (JPDR-2), 45 Mwt, using pseuds-random signals. The aim of this paper is further to extend and refine this identification problem based on the measured data. Emphasis is on the fact that the results obtained by the non-parametric method can by justified by the parametric one. Elucidation of feedback map is also made by estimating the noise contribution rate. Results of computation show the effectiveness of the procedure. (author)

  9. Surface Roughness Prediction Model using Zirconia Toughened Alumina (ZTA) Turning Inserts: Taguchi Method and Regression Analysis

    Science.gov (United States)

    Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath

    2015-05-01

    In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.

  10. Regressão múltipla stepwise e hierárquica em Psicologia Organizacional: aplicações, problemas e soluções Stepwise and hierarchical multiple regression in organizational psychology: Applications, problemas and solutions

    Directory of Open Access Journals (Sweden)

    Gardênia Abbad

    2002-01-01

    Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.

  11. Regressão múltipla stepwise e hierárquica em Psicologia Organizacional: aplicações, problemas e soluções / Stepwise and hierarchical multiple regression in organizational psychology: Applications, problemas and solutions

    Scientific Electronic Library Online (English)

    Gardênia, Abbad; Cláudio Vaz, Torres.

    Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros [...] do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica). Abstract in english This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as su [...] ppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical).

  12. Imputação múltipla e análise de casos completos em modelos de regressão logística: uma avaliação prática do impacto das perdas em covariáveis / Multiple imputation and complete case analysis in logistic regression models: a practical assessment of the impact of incomplete covariate data

    Scientific Electronic Library Online (English)

    Vitor Passos, Camargos; Cibele Comini, César; Waleska Teixeira, Caiaffa; Cesar Coelho, Xavier; Fernando Augusto, Proietti.

    2011-12-01

    Full Text Available Pesquisadores da área da saúde lidam frequentemente com o problema das bases de dados incompletas. A Análise de Casos Completos (ACC), que restringe as análises aos indivíduos com dados completos, reduz o tamanho da amostra e pode produzir estimativas viciadas. Baseado em fundamentos estatísticos, o [...] método de Imputação Múltipla (IM) utiliza todos os dados coletados e é recomendado como alternativa à ACC. Dados do estudo Saúde em Beagá, inquérito domiciliar em que participaram 4.048 adultos de dois dos nove distritos sanitários da Cidade de Belo Horizonte no biênio 2008-2009, foram utilizados para avaliar a ACC e diferentes abordagens de IM no contexto de modelos logísticos com covariáveis incompletas. Peculiaridades de algumas variáveis desse estudo permitiram aproximar uma situação em que os dados ausentes de uma covariável são recuperados, e assim os resultados anteriores e posteriores à recuperação são comparados. Verificou-se que mesmo a abordagem mais simplista de IM obteve melhor desempenho que a ACC, já que se aproximou mais dos resultados pós-recuperação. Abstract in english Researchers in the health field often deal with the problem of incomplete databases. Complete Case Analysis (CCA), which restricts the analysis to subjects with complete data, reduces the sample size and may result in biased estimates. Based on statistical grounds, Multiple Imputation (MI) uses all [...] collected data and is recommended as an alternative to CCA. Data from the study Saúde em Beagá, attended by 4,048 adults from two of nine health districts in the city of Belo Horizonte, Minas Gerais State, Brazil, in 2008-2009, were used to evaluate CCA and different MI approaches in the context of logistic models with incomplete covariate data. Peculiarities in some variables in this study allowed analyzing a situation in which the missing covariate data are recovered and thus the results before and after recovery are compared. Based on the analysis, even the more simplistic MI approach performed better than CCA, since it was closer to the post-recovery results.

  13. Análise de regressão múltipla das concentrações de PM10 em função de elementos meteorológicos para Porto Alegre, Estado do Rio Grande do Sul, em 2005 e 2006 - doi: 10.4025/actascitechnol.v33i1.9627 Multiple regression analysis of PM10 concentration concerning to meteorological elements for Porto Alegre, Rio Grande do Sul State, in 2005 and 2006 - doi: 10.4025/actascitechnol.v33i1.9627

    Directory of Open Access Journals (Sweden)

    Rosana de Cassia de Souza Schneider

    2011-03-01

    Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seu comportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se o comportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: as concentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; as temperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Data analysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured places are east and southeast.

  14. Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets

    Science.gov (United States)

    Curran, Patrick J.; Hussong, Andrea M.

    2009-01-01

    There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available.…

  15. Regression Analysis and Analysis Of Variance for EN353 and20MnCr5 Alloyed Steels for Drilling Cutting Forces

    Directory of Open Access Journals (Sweden)

    Keerthiprasad.K

    2014-08-01

    Full Text Available In recent years, alloy steels have been widely usedin aerospace and automotive industries. Machining of these materials requires better understanding of cutting processes regarding accuracy and efficiency. This study addresses the modelling of the machinability of EN353 and 20mncr5 materials. In this study, multiple regression analysis (MRA is used to investigate the influence of some parameters on the thrust force and torque in the drilling processes of alloy steel materials. The model were identified by using cutting speed, feed rate, and depth as input data and the thrust force and torque as the output data. The statistical analysis accompanied with results showed that cutting feed (f were the most significant parameters on the drilling process, while spindle speed seemed insignificant. Since the spindle speed was insignificant, it directed us to set it either at the highest spindle speed to obtain high material removal rate or at the lowest spindle speed to prolong the tool life depending on the need for the application. The mathematical model is based on a power regression modelling, dependent on the three above mentioned parameters.

  16. Experimental Studies on Surface Roughness in Drilling MDF Composite Panels using Taguchi and Regression Analysis Method

    Directory of Open Access Journals (Sweden)

    M.I. Rizwan Jamal

    2012-01-01

    Full Text Available Medium Density Fiber board (MDF panels are appropriate for many exterior and interior industrial applications. The degree of surface roughness of MDF plays an important role since, any surface irregularities will affect the final quality of the product. In the present study, regression model were developed to predict surface roughness in drilling MDF panels with carbide step drills. In the development of predictive models, drilling parameters of spindle speed, feed rate and drill diameter were considered as model variables. For this purpose, Taguchi’s design of experiments was carried out in order to collect surface roughness value. The Orthogonal Array (OA and Analysis of Variance (ANOVA are employed to study the surface roughness characteristics in drilling operation of MDF panels. The objective is to establish a correlation between spindle speed, feed rate and drill diameter with surface roughness in a MDF panel. The experiments are conducted as per Taguchi L27 orthogonal array with different cutting conditions. ANOVA and F-test were used to check the validity of regression model and to determine the significant parameter affecting the surface roughness. The statistical analysis showed that the feed rate was an utmost parameter on surface roughness. The microstructure of drilled surfaces were also studied by scanning electron microscopy (SEM.The SEM investigations reveled that drilling MDF panels with step drill produce surface striations and waviness which were increased significantly with feed rate.

  17. Statistical learning method in regression analysis of simulated positron spectral data

    International Nuclear Information System (INIS)

    Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)

  18. Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression

    Directory of Open Access Journals (Sweden)

    Vargas-Irwin, Cristina

    2010-06-01

    Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes

  19. Factors affecting the outcome of excimer laser photorefractive keratectomy: a preliminary multivariable regression analysis

    Science.gov (United States)

    Maguen, Ezra I.; Papaioannou, Thanassis; Nesburn, Anthony B.; Salz, James J.; Warren, Cathy; Grundfest, Warren S.

    1996-05-01

    Multivariable regression analysis was used to evaluate the combined effects of some preoperative and operative variables on the change of refraction following excimer laser photorefractive keratectomy for myopia (PRK). This analysis was performed on 152 eyes (at 6 months postoperatively) and 156 eyes (at 12 months postoperatively). The following variables were considered: intended refractive correction, patient age, treatment zone, central corneal thickness, average corneal curvature, and intraocular pressure. At 6 months after surgery, the cumulative R2 was 0.43 with 0.38 attributed to the intended correction and 0.06 attributed to the preoperative corneal curvature. At 12 months, the cumulative R2 was 0.37 where 0.33 was attributed to the intended correction, 0.02 to the preoperative corneal curvature, and 0.01 to both preoperative corneal thickness and to the patient age. Further model augmentation is necessary to account for the remaining variability and the behavior of the residuals.

  20. Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes

    International Nuclear Information System (INIS)

    We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index ? that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the ? is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system

  1. Multiple tumors. Analysis of 50 patients

    International Nuclear Information System (INIS)

    The description of multiple primary neoplasms dating from the late nineteenth; Warrem and Gates established the clinicopathological criteria for diagnosis. frequency Clinical presentation is from 1.5 to 5.4% of cancers, and of 5% to 11% by autopsies. In recent years there has been an increase in second tumors probably due to new strategies of staging, monitoring patients (ptes) and therapeutic results with improved survival from first diagnosis. Objective: Analysis of 50 tumor ptes multiple carriers assisted in the HCFF.AA Oncology Service in the period 1/1997 to 1/2004. Patients and methods: ptes included. registered in the H.C.FF.AA, carriers 2 or histologically documented malignant tumors. Were reviewed medical records, describing age, sex, date of diagnosis and type of tumor. Frequency of these tumors and their occurrence interval were analyzed. Results: We included 50 ptes, with 2.0% of registered patients. (2400). The average age was 61 years (36-89 years). Median appearance interval between the first and second tumor was 28 months (0-300). The most common tumors were: breast carcinoma (23), no skin tumors melanoma (15), colon adenocarcinoma (12), prostate (8) and kidney (6). according to appearance 10 were synchronous and 40 metachronous. Breast tumor They most often associated endometrial tumors (5), ovarian (3), colon (3) and kidney (3). Of the 50 patients, 42 had 2 tumors in 8 cases and 3 tumors. Conclusions: The frequency of occurrence of multiple neoplasms in our series and presentation mode in time does not differ from that reported by other authors. Monitoring of patients with cancer and advances in diagnosis Therapeutic and lead to increased tumor diagnosis seconds and a new therapeutic challenge

  2. The Contribution of Schooling to the Cognitive Development of Secondary Education Students in Cyprus: An Application of Regression Discontinuity with Multiple Cut-Off Points

    Science.gov (United States)

    Kyriakides, Leonidas; Luyten, Hans

    2009-01-01

    This article reports the results of a study in which the basic regression-discontinuity approach to assess the effect of 1 year of schooling is extended. The data analysis covers the 6 grades of secondary education in Cyprus and thus assesses the contribution of secondary education to the cognitive development of 12- to 18-year-old students. A…

  3. Quantitative laser-induced breakdown spectroscopy data using peak area step-wise regression analysis: an alternative method for interpretation of Mars science laboratory results

    Energy Technology Data Exchange (ETDEWEB)

    Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Dyar, Melinda D [MT HOLYOKE COLLEGE; Schafer, Martha W [LSU; Tucker, Jonathan M [MT HOLYOKE COLLEGE

    2008-01-01

    The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.

  4. Quantitative structure-property relationship modeling of water-to-wet butyl acetate partition coefficient of 76 organic solutes using multiple linear regression and artificial neural network.

    Science.gov (United States)

    Dashtbozorgi, Zahra; Golmohammadi, Hassan

    2010-12-01

    The main aim of this study was the development of a quantitative structure-property relationship method using an artificial neural network (ANN) for predicting the water-to-wet butyl acetate partition coefficients of organic solutes. As a first step, a genetic algorithm-multiple linear regression model was developed; the descriptors appearing in this model were considered as inputs for the ANN. These descriptors are principal moment of inertia C (I(C)), area-weighted surface charge of hydrogen-bonding donor atoms (HACA-2), Kier and Hall index (order 2) ((2)?), Balaban index (J), minimum bond order of a C atom (P(C)) and relative negative-charged SA (RNCS). Then a 6-4-1 neural network was generated for the prediction of water-to-wet butyl acetate partition coefficients of 76 organic solutes. By comparing the results obtained from multiple linear regression and ANN models, it can be seen that statistical parameters (Fisher ratio, correlation coefficient and standard error) of the ANN model are better than that regression model, which indicates that nonlinear model can simulate the relationship between the structural descriptors and the partition coefficients of the investigated molecules more accurately. PMID:21082679

  5. Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data

    Science.gov (United States)

    Ulbrich, N.

    2015-01-01

    An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.

  6. The Use of Logistic Regression in the Analysis of Data Concerning Good Medical Practice

    Directory of Open Access Journals (Sweden)

    Damon MN

    2002-06-01

    Full Text Available Logistic regression is one of the commonly used models of explicative multivariate analysis utilized in epidemiology. Its use, which has become easier with modern statistical software, allows researchers to control confusion bias. It measures the odds-ratio , a quantification of the association probability between a given occurrence, represented by a dichotomic variable, and factors susceptible to influence it, represented by explicative variables. The choice of explicative variables integrated into the model is based on previous information on the study subject and is aimed at avoiding the confusion factors which have already been identified. The authors explain the fundamental principles of logistic regression and the steps involved in its application. By using two examples (the quality of the follow up care given to diabetics and in-hospital mortality after acute myocardial infarction, they demonstrate the value this statistical tool can have in studies performed by the medical service of the national health care fund, particularly in studies designed to evaluate professional practice.

  7. Personality disorders, violence, and antisocial behavior: a systematic review and meta-regression analysis.

    Science.gov (United States)

    Yu, Rongqin; Geddes, John R; Fazel, Seena

    2012-10-01

    The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not. PMID:23013345

  8. Investigation of the relationship between very warm days in Romania and large-scale atmospheric circulation using multiple linear regression approach

    Science.gov (United States)

    Barbu, N.; Cuculeanu, V.; Stefan, S.

    2015-08-01

    The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.

  9. Characterization of breast masses by dynamic enhanced MR imaging. A logistic regression analysis

    International Nuclear Information System (INIS)

    Purpose: To identify features useful for differentiation between malignant and benign breast neoplasms using multivariate analysis of findings by MR imaging. Material and Methods: In a retrospective analysis, 61 patients with 64 breast masses underwent MR imaging and the time-signal intensity curves for precontrast dynamic postcontrast images were quantitatively analyzed. Statistical analysis was performed using a logistic regression model, which was prospectively tested in another 34 patients with suspected breast masses. Results: Univariate analysis revealed that the reliable indicators for malignancy were first the appearance of the tumor border, followed by the washout ratio, internal architecture after contrast enhancement, and peak time. The factors significantly associated with malignancy were irregular tumor border, followed by washout ratio, internal architecture, and peak time. For differentiation between benignity and malignancy, the maximum cut-off point was to be found between 0.47 and 0.51. In a prospective application of this model, 91% of the lesions were accurately discriminated as benign or malignant lesions. Conclusion: Combination of contrast-enhanced dynamic and postcontrast-enhanced MR imaging provided accurate data for the diagnosis of malignant neoplasms of the breast. The model had an accuracy of 91% (sensitivity 90%, specificity 93%). (orig.)

  10. Regression analysis between body and head measurements of Chinese alligators (Alligator sinensis in the captive population

    Directory of Open Access Journals (Sweden)

    Wu, X. B.

    2006-06-01

    Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.

  11. Application of multivariate linear regression for determination of ash content in coal by XRF analysis

    International Nuclear Information System (INIS)

    Measurements of excited and backscattered fluorescence radiation intensity were applied for ash content determination in coal samples. An Si(Li) detector and low energy X- and gamma ray sources 55Fe, 109Cd, 238Pu, 241Am were used. The measurement facility, consisting of an argon filled proportional counter and a 238Pu radiation source, was tested and compared with other radioanalytical methods for ash content determination. The evaluation of results was based on the Snedecor F test and the analysis of the rootmean square of estimate. The best results were obtained when 55Fe source was used. In the multivariate linear regression independent variables SiK?, CaK? and backscattered radiation intensities have been selected as variables that are best related with content in coal. (author)

  12. Using instant messaging to enhance the interpersonal relationships of Taiwanese adolescents: evidence from quantile regression analysis.

    Science.gov (United States)

    Lee, Yueh-Chiang; Sun, Ya Chung

    2009-01-01

    Even though use of the internet by adolescents has grown exponentially, little is known about the correlation between their interaction via Instant Messaging (IM) and the evolution of their interpersonal relationships in real life. In the present study, 369 junior high school students in Taiwan responded to questions regarding their IM usage and their dispositional measures of real-life interpersonal relationships. Descriptive statistics, factor analysis, and quantile regression methods were used to analyze the data. Results indicate that (1) IM helps define adolescents' self-identity (forming and maintaining individual friendships) and social-identity (belonging to a peer group), and (2) how development of an interpersonal relationship is impacted by the use of IM since it appears that adolescents use IM to improve their interpersonal relationships in real life. PMID:19435175

  13. Model selection for marginal regression analysis of longitudinal data with missing observations and covariate measurement error.

    Science.gov (United States)

    Shen, Chung-Wei; Chen, Yi-Hau

    2015-10-01

    Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations. PMID:26012353

  14. Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis

    Directory of Open Access Journals (Sweden)

    Hossam E. Hosny

    2015-07-01

    Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.

  15. Collaborative Regression

    OpenAIRE

    Gross, Samuel M.; Tibshirani, Robert

    2014-01-01

    We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with this type of data is ``sparse multiple canonical correlation analysis'' (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation ...

  16. Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood.

    Science.gov (United States)

    Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar

    2016-01-01

    Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on ? value of 0.76 and ?(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015g and 0.08g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250mgg(-1) and 125mgg(-1) respectively. PMID:26414425

  17. Spatial-Temporal Variations of Turbidity and Ocean Current Velocity of the Ariake Sea Area, Kyushu, Japan Through Regression Analysis with Remote Sensing Satellite Data

    OpenAIRE

    Yuichi Sarusawa; Kohei Arai

    2013-01-01

    Regression analysis based method for turbidity and ocean current velocity estimation with remote sensing satellite data is proposed. Through regressive analysis with MODIS data and measured data of turbidity and ocean current velocity, regressive equation which allows estimation of turbidity and ocean current velocity is obtained. With the regressive equation as well as long term MODIS data, turbidity and ocean current velocity trends in Ariake Sea area are clarified. It is also confirmed tha...

  18. Applying support vector regression analysis on grip force level-related corticomuscular coherence

    DEFF Research Database (Denmark)

    Rong, Yao; Han, Xixuan

    2014-01-01

    Voluntary motor performance is the result of cortical commands driving muscle actions. Corticomuscular coherence can be used to examine the functional coupling or communication between human brain and muscles. To investigate the effects of grip force level on corticomuscular coherence in an accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced to compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion is more sensitive to grip force level than coherence area. The significantly higher corticomuscular coherence occurred in the alpha (p<0.01) and beta band (p<0.01) during 75 % MGF, but in the gamma band (p<0.01) during 25 % MGF. The results suggest that sensorimotor cortex might control the activity of an accessory muscle for hand grip with increased grip intensity by changing functional corticomuscular coupling at certain frequency bands (alpha, beta and gamma bands).

  19. Artificial regressions

    OpenAIRE

    Davidson, Russell; MacKinnon, James

    2001-01-01

    Associated with every popular nonlinear estimation method is at least one 'artificial' linear regression. We define an artificial regression in terms of three conditions that it must satisfy. Then we show how artificial regressions can be useful for numerical optimization, testing hypotheses, and computing parameter estimates. Several existing artificial regressions are discussed and are shown to satisfy the defining conditions, and a new artificial regression for regression models with heter...

  20. Some studies on cutting force and temperature in machining Ti-6Al-4V alloy using regression analysis and ANOVA

    Directory of Open Access Journals (Sweden)

    K.Satyanarayana

    2013-06-01

    Full Text Available The present work deals with the cutting forces and cutting temperature produced during turning of titanium alloy Ti-6Al-4V with PVD TiN coated tungsten carbide inserts under dry environment. The 1st order mathematical models are developed using multiple regression analysis and optimized the process parameters using contour plots. The model presented high determination coefficient (R2 = 0.964 and 0.989 explaining 96.4 % and 98.9 % of the variability in the cutting force and cutting temperature, which indicates the goodness of fit for the model and high significance of the model. The developed mathematical model correlates the relationship of the cutting force and temperature with the process parameters with good degree of approximation. From the contour plots, the optimal parametric combination for lowest cutting force is v 3 (75 m/min – f 1 (0.25 mm/rev. Similarly, the optimal parametric combination for minimum temperature is v 1 (45 m/min – f 1 (0.25 mm/rev. Cutting speed is found to be the most significance parameter on cutting forces followed by feed. Similarly, for cutting temperature, feed is found to be the most influencing parameter followed by cutting speed.

  1. Regression anatomy, revealed

    OpenAIRE

    Filoso, Valerio

    2010-01-01

    The Regression Anatomy (RA) theorem (Angrist and Pischke 2009) is an alternative formulation of the Frisch-Waugh-Lovell (FWL) theorem (Frisch and Waugh 1933; Lovell 1963), a key finding in the algebra of OLS multiple regression models. In this paper, we present a command, reganat, to implement graphically the method of RA. This addition complements the built-in Stata command avplot in the validation of linear models, producing bidimensional scatterplots and regression lines obtained controlli...

  2. Thermodynamic dissociation constants of silychristin, silybin, silydianin and mycophenolate by the regression analysis of spectrophotometric data

    International Nuclear Information System (INIS)

    Mixed dissociation constants of four drug acids, i.e. silychristin, silybinin, silydianin and mycophenolate at various ionic strengths I of range 0.01 and 0.30 and at temperatures of 25 and 37 deg. C were determined using the SQUAD(84) regression analysis program applied to pH-spectrophotometric titration data. The proposed strategy of an efficient experimentation in a protonation constants determination, followed by a computational strategy for the chemical model with a protonation constants determination, is presented on the protonation equilibria of silychristin. The thermodynamic dissociation constant pKaT was estimated by non-linear regression of {pKa, I data at 25 and 37 deg. C: for silychristin pKa,1T=6.52(16) and 6.62(1), pKa,2T=7.22(13) and 7.41(5), pKa,3T=8.96(9) and 8.94(9), pKa,4T=10.17(7) and 10.03(8), pKa,5T=11.89(4) and 11.63(7); for silybin pKa,1T=7.00(4) and 6.86(5), pKa,2T=8.77(11) and 8.77(3), pKa,3T=9.57(8) and 9.62(1), pKa,4T=11.66(3) and 11.38(1); for silydianin pKa,1T=6.64(7) and 7.10(6), pKa,2T=7.78(5) and 8.93(1), pKa,3T=9.66(9) and 10.06(11), pKa,4T=10.71(7) and 10.77(7), pKa,5T=12.26(5) and 12.14(5); for mycophenolate pKaT=8.32(1) and 8.14(1). Goodness-of-fit tests for various regression diagnostics enabled the reliability of parameter estimates to be found

  3. Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant

    OpenAIRE

    Liguo Yu

    2012-01-01

    Negative binomial regression has been proposed as an approach to predicting fault-prone software modules. However, little work has been reported to study the strength, weakness, and applicability of this method. In this paper, we present a deep study to investigate the effectiveness of using negative binomial regression to predict fault-prone software modules under two different conditions, self-assessment and forward assessment. The performance of negative binomial regression model is also c...

  4. The Jackknife Interval Estimation of Parametersin Partial Least Squares Regression Modelfor Poverty Data Analysis

    Directory of Open Access Journals (Sweden)

    Pudji Ismartini

    2010-08-01

    Full Text Available One of the major problem facing the data modelling at social area is multicollinearity. Multicollinearity can have significant impact on the quality and stability of the fitted regression model. Common classical regression technique by using Least Squares estimate is highly sensitive to multicollinearity problem. In such a problem area, Partial Least Squares Regression (PLSR is a useful and flexible tool for statistical model building; however, PLSR can only yields point estimations. This paper will construct the interval estimations for PLSR regression parameters by implementing Jackknife technique to poverty data. A SAS macro programme is developed to obtain the Jackknife interval estimator for PLSR.

  5. Flexible Expectile Regression in Reproducing Kernel Hilbert Space

    OpenAIRE

    Yang, Yi; ZHANG, Teng; Zou, Hui

    2015-01-01

    Expectile, first introduced by Newey and Powell (1987) in the econometrics literature, has recently become increasingly popular in risk management and capital allocation for financial institutions due to its desirable properties such as coherence and elicitability. The current standard tool for expectile regression analysis is the multiple linear expectile regression proposed by Newey and Powell in 1987. The growing applications of expectile regression motivate us to develop...

  6. Quantile Regression in the Study of Developmental Sciences

    OpenAIRE

    Petscher, Yaacov; Logan, Jessica A. R.

    2013-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstra...

  7. Using Mixture Regression to Identify Varying Effects: A Demonstration with Paternal Incarceration

    Science.gov (United States)

    Dyer, W. Justin; Pleck, Joseph; McBride, Brent

    2012-01-01

    The most widely used techniques for identifying the varying effects of stressors involve testing moderator effects via interaction terms in regression or multiple-group analysis in structural equation modeling. The authors present mixture regression as an alternative approach. In contrast to more widely used approaches, mixture regression

  8. High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis.

    Science.gov (United States)

    Daye, Z John; Chen, Jinbo; Li, Hongzhe

    2012-03-01

    We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833

  9. Cigarette Smoking Habits among Men and Women in Turkey: A Meta Regression Analysis

    Directory of Open Access Journals (Sweden)

    F Sahin Mutlu

    2006-06-01

    Full Text Available Smoking has become more prevalent in Turkey than it has in those of western countries during the past decade. This study was conducted to make parameter estimations on gender related smoking habits with the minimum of variance. Of the ninety-two researches related to smoking habits conducted from 1981 to 2003 in Turkey, 60 were deemed appropriate for the application of Meta analysis and Meta regression analysis. The proportions of men and women smoking cigarettes were 0.51 and 0.35, respectively. The proportion of men smoking cigarette in 1996 and the years before it was 0.52, and for women as 0.35. However, the figures for the years following 1996 were 0.41 for men, and 0.32 for women. In the results of the Dersimonian and Laird random effect model, the Odds Ratio, which shows the tendency of men to smoke compared to women, was found 1.894 for the period of 1981-2003. A heterogeneous distribution between the researches was apparent (Q=1560.91, P<0.001 as well as for Tau-square test (x2=0.55, z=6.29, P<0.001. We propose that effective precautions should be considered, especially with regard to the introduction of laws to minimize the smoking habit for both sexes, with particular attention to women.

  10. Mixed-effects Poisson regression analysis of adverse event reports: the relationship between antidepressants and suicide.

    Science.gov (United States)

    Gibbons, Robert D; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K; Bhaumik, Dulal K; Brown, C Hendricks; Kapur, Kush; Marcus, Sue M; Hur, Kwan; Mann, J John

    2008-05-20

    A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Administration (FDA)'s Adverse Event Reporting System (AERS) on the relationship between antidepressants and suicide. We obtain point estimates and 95 per cent confidence (posterior) intervals for the rate multiplier for each drug (e.g. antidepressants), which can be used to determine whether a particular drug has an increased risk of association with a particular AE (e.g. suicide). Confidence (posterior) intervals that do not include 1.0 provide evidence for either significant protective or harmful associations of the drug and the adverse effect. We also examine EB, parametric Bayes, and semiparametric Bayes estimators of the rate multipliers and associated confidence (posterior) intervals. Results of our analysis of the FDA AERS data revealed that newer antidepressants are associated with lower rates of suicide adverse event reports compared with older antidepressants. We recommend improvements to the existing AERS system, which are likely to improve its public health value as an early warning system. PMID:18404622

  11. Generalized multilevel function-on-scalar regression and principal component analysis.

    Science.gov (United States)

    Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer

    2015-06-01

    This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects. PMID:25620473

  12. Using Spline Regression in Semi-Parametric Stochastic Frontier Analysis: An Application to Polish Dairy Farms

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA is criticised, because it cannot account for statistical noise such as random production shocks and measurement errors, which are inherent in more or less all production data sets. In contrast, the SFA is criticised, because it requires the specification of a functional form, which involves the risk of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric regression based on kernel estimators. This approach combines the virtues of the DEA and the SFA, while avoiding their drawbacks: it avoids the specification of a functional form and at the same time accounts for statistical noise. More recently, this approach was used by Henderson and Simar (2005), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply this approach to the Polish dairy sector and use a panel data set of Polish dairy farms from the years 2004-2010. The Polish dairy sector has changed considerably since the integration of Poland in the European Union: the number of dairy producers decreased by one third and the average herd size increased from 3.8 to 5.7 cows per farm within the period 2004-2010. It is expected that farms with small herds (less than 30 dairy cows) will quit and that the number of large farms (with more than 100 dairy cows) will increase. Therefore, a thorough empirical study of the technical efficiency and scale efficiency of Polish dairy farms contributes to the insight into this dynamic process. Furthermore, we compare and evaluate the results of this spline-based semi-parametric stochastic frontier model with results of other semi-parametric stochastic frontier models and of traditional parametric stochastic frontier models. References: Fan, Y.; Li, Q. , Weersink, A. (1996), Semiparametric Estimation of Stochastic Production Frontier Models, Journal of Business and Economic Statistics. Henderson, D. J., Simar, L. (2005), A Fully Nonparametric Stochastic Frontier Model for Panel Data, University of New York Henningsen, A. , Kumbhakar, S. C. (2009), Semiparametric Stochastic Frontier Analysis: An Application to Polish Farms During Transition, Paper presented at the (EWEPA) in Pisa, Italy. Kumbhakar S. C., Park, B. U., Simar, L. Tsionas E. G. (2007), Nonparametric Stochastic Frontiers: A Local Maximum Likelihood Approach, Journal of Econometrics. Ma, S., Racine, J. S. & Yang, L. (2011), Spline regression in the presence of categorical predictors, Working Paper

  13. Determining spectroscopic redshifts by using k nearest neighbor regression. I. Description of method and analysis

    Science.gov (United States)

    Kügler, S. D.; Polsterer, K.; Hoecker, M.

    2015-04-01

    Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ?90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.

  14. A Bayesian Hierarchical Non-Linear Regression Model in Receiver Operating Characteristic Analysis of Clustered Continuous Diagnostic Data

    OpenAIRE

    Zou, Kelly H.; O’Malley, A. James

    2005-01-01

    Receiver operating characteristic (ROC) analysis is a useful evaluative method of diagnostic accuracy. A Bayesian hierarchical nonlinear regression model for ROC analysis was developed. A validation analysis of diagnostic accuracy was conducted using prospective multi-center clinical trial prostate cancer biopsy data collected from three participating centers. The gold standard was based on radical prostatectomy to determine local and advanced disease. To evaluate the diagnostic performance o...

  15. / Partial least squares (PLS) regression and its application to coal analysis

    Scientific Electronic Library Online (English)

    Carlos E, Alciaturi; Marcos E, Escobar; Carlos, De La Cruz; Carlos, Rincón.

    2003-12-01

    Full Text Available Los métodos instrumentales de análisis químico hacen uso de las relaciones entre la señal obtenida y una propiedad del sistema estudiado (generalmente, una concentración). Los avances en electrónica y computación han hecho posible un rápido progreso en la adquisición de datos y en su transmisión y p [...] rocesamiento. La aplicación de diversos métodos matemáticos al cálculo de concentraciones y otras propiedades a partir de datos instrumentales se conoce como quimiometría y es un área de intensa actividad, por sus amplias aplicaciones en la industria química, de procesos y en estudios ambientales. Uno de los métodos más usados en quimiometría es el método de mínimos cuadrados parciales, conocido por sus iniciales en inglés, PLS ("partial least squares"). Este método, relacionado con la regresión de componentes principales, PCR ("principal components regression") posee ventajas teóricas y computacionales que han llevado a innumerables aplicaciones. Se encuentran en Internet decenas de miles de referencias solamente para el PLS lineal. En este artículo, se explica los fundamentos del método y se muestra una aplicación a la predicción de propiedades de carbones minerales a partir de datos del infrarrojo medio, con el objetivo de desarrollar métodos de análisis rápidos y no destructivos para estos materiales. Abstract in english Instrumental chemical analysis methods use the relationships between a signal obtained and a property (generally a concentration) of the system under study. The study and applications of these relations is known as chemometrics, a discipline of intense development, with ample applications in chemica [...] l and process industry and in environmental studies. The method of partial least squares (PLS) is one of the most used in chemometrics. This method is closely related to principal components regression (PCR). PLS have theoretical and computational advantages that have led to a great number of applications. The numbers of Internet sites referring to PLS are hundreds of thousands. Here, we give the fundamentals and show an application to prediction of coal properties from mid-infrared data, with the purpose of developing fast, non-destructive methods of analysis for these materials.

  16. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  17. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based on a cross-section sample. We summarize various important extensions of the model including the nonlinear quantileregression model, censored quantile regression, and quantile regression for time-series data. We also discuss a number of more recent extensions of the quantile regression model to censored data, duration data, and endogeneity, and we describe how quantile regression can be used for decomposition analysis. Finally, we identify several key issues, which should be addressed by future research, and we provide an overview of quantile regression implementations in major statistics software. Our treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work.

  18. Analysis of PEM fuel cell experimental data using Principal Component Analysis and Multi linear regression

    OpenAIRE

    Placca, Latevi; Kouta, Raed; Candusso, Denis; Blachot, Jean-François; CHARON, Willy

    2010-01-01

    Polarisation curves performed at the Fuel Cell System Laboratory (FC LAB) at Belfort on a PEM fuel cell stack using a homemade fully instrumented test bench led to more than 100 variables depending on time. Visualising and analysing all the different test variables are complex. In this work, we show how the Principal Component Analysis (PCA) method helps to explore correlations between variables and similarities between measurements at a specific sampling time (individuals). To complete this ...

  19. Depression is the main determinant of quality of life in multiple sclerosis: a classification-regression (CART) study

    OpenAIRE

    Mauro, Alessandro

    2006-01-01

    PURPOSE: Quality of life in multiple sclerosis has been often measured through the SF-36 questionnaire. In this study, validation of the SF-36 summary scores, its 'physical' component, and its 'mental' component was attempted by exploring the joint predictive power of disability (EDSS score), of anxiety and depression (HADS-A and -D scores, respectively), and of disease duration, progression type, age, gender and marital status. METHOD: The sample consisted of 75 patients suffering from multi...

  20. Prognostic factorsin inoperable adenocarcinoma of the lung: A multivariate regression analysis of 259 patiens

    DEFF Research Database (Denmark)

    SØrensen, Jens Benn; Badsberg, Jens Henrik

    1989-01-01

    The prognostic factors for survival in advanced adenocarcinoma of the lung were investigated in a consecutive series of 259 patients treated with chemotherapy. Twenty-eight pretreatment variables were investigated by use of Cox's multivariate regression model, including histological subtypes and degree of differentiation, the new international staging system for lung cancer, and seven laboratory parameters. Staging of the patients included bone marrow examination but were otherwise nonextensive without routine bone, liver, and brain scans. Factors predicting poor survival were low performance status, stage IV disease, no prior nonradical resection, liver metastases, high values of white blood cell count, and lactate dehydrogenase, and low values of aspartate aminotransaminase. The nonradical resection may not be a prognostic factor because of the resection itself but may rather serve as an indicator for patients having minimal disease spread. Liver metastases were of limited clinical value as a prognostic factor because they were detected in only seven cases in this patient population. A new Cox analysis ignoring the influence of this variable revealed no other variables than those occurring in the former Cox model to be of importance (performance status, stage, surgical resection, WBC, aspartate aminotransaminase, and lactate dehydrogenase). This simplified model appears to be a feasible clinical tool, allowing for prognostic stratification of patients when first the inoperability of the patient is known.

  1. Improving precision of X-ray fluorescence analysis of lanthanide mixtures using partial least squares regression

    Science.gov (United States)

    Kirsanov, Dmitry; Panchuk, Vitaly; Goydenko, Alexander; Khaydukova, Maria; Semenov, Valentin; Legin, Andrey

    2015-11-01

    This study addresses the problem of simultaneous quantitative analysis of six lanthanides (Ce, Pr, Nd, Sm, Eu, Gd) in mixed solutions by two different X-ray fluorescence techniques: energy-dispersive (EDX) and total reflection (TXRF). Concentration of each lanthanide was varied in the range 10- 6-10- 3 mol/L, low values being around the detection limit of the method. This resulted in XRF spectra with very poor signal to noise ratio and overlapping bands in case of EDX, while only the latter problem was observed for TXRF. It was shown that ordinary least squares approach in numerical calibration fails to provide for reasonable precision in quantification of individual lanthanides. Partial least squares (PLS) regression was able to circumvent spectral inferiorities and yielded adequate calibration models for both techniques with RMSEP (root mean squared error of prediction) values around 10- 5 mol/L. It was demonstrated that comparatively simple and inexpensive EDX method is capable of ensuring the similar precision to more sophisticated TXRF, when the spectra are treated by PLS.

  2. Variable Selection for Functional Logistic Regression in fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Nedret BILLOR

    2015-03-01

    Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.

  3. Comparison of Bayesian and Classical Analysis of Weibull Regression Model: A Simulation Study

    Directory of Open Access Journals (Sweden)

    ?mran KURT ÖMÜRLÜ

    2011-01-01

    Full Text Available Objective: The purpose of this study was to compare performances of classical Weibull Regression Model (WRM and Bayesian-WRM under varying conditions using Monte Carlo simulations. Material and Methods: It was simulated the generated data by running for each of classical WRM and Bayesian-WRM under varying informative priors and sample sizes using our simulation algorithm. In simulation studies, n=50, 100 and 250 were for sample sizes, and informative prior values using a normal prior distribution with was selected for b1. For each situation, 1000 simulations were performed. Results: Bayesian-WRM with proper informative prior showed a good performance with too little bias. It was found out that bias of Bayesian-WRM increased while priors were becoming distant from reliability in all sample sizes. Furthermore, Bayesian-WRM obtained predictions with more little standard error than the classical WRM in both of small and big samples in the light of proper priors. Conclusion: In this simulation study, Bayesian-WRM showed better performance than classical method, when subjective data analysis performed by considering of expert opinions and historical knowledge about parameters. Consequently, Bayesian-WRM should be preferred in existence of reliable informative priors, in the contrast cases, classical WRM should be preferred.

  4. Analysis of neutral particle emission containing a fast ion tail by use of a non linear-regression

    International Nuclear Information System (INIS)

    We present a program for the analysis of neutral particle emission detected by a single channel analyzer which may be easily modified to handle the data from a multichannel analyzer. In particular the program uses a nonlinear regression to fit the data and therefore correctly handles cases where the Maxwellian velocity distribution function is distorted by a high energy ion population

  5. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    Science.gov (United States)

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  6. Multivariate calibration of spectral data using dual-domain regression analysis

    International Nuclear Information System (INIS)

    To date, few efforts have been made to take simultaneous advantage of the local nature of spectral data in both the time and frequency domains in a single regression model. We describe here the use of a novel chemometrics algorithm using the wavelet transform. We call the algorithm dual-domain regression, as the regression step defines a weighted model in the time-domain based on the contributions of parallel, frequency-domain models made from wavelet coefficients reflecting different scales. In principle, any regression method can be used, and implementation of the algorithm using partial least squares regression and principal component regression are reported here. The performance of the models produced from the algorithm is generally superior to that of regular partial least squares (PLS) or principal component regression (PCR) models applied to data restricted to a single domain. Dual-domain PLS and PCR algorithms are applied to near infrared (NIR) spectral datasets of Cargill corn samples and sets of spectra collected on batch chemical reactions run in different reactors to illustrate the improved robustness of the modeling

  7. Application of Bootstrap Sample-Resample Method in Logistic Regression in Analysis of Breast Cancer Data

    Directory of Open Access Journals (Sweden)

    H Zeraati

    2006-05-01

    Full Text Available Background and Aim: The purpose of this study was to assess the accuracy of the bootstrap method in logistic regression and to explore the methods use in logistic regression models in cases where the sample size is insufficient. Materials and Methods: We use data from 150 patients who had undergone surgery at the Cancer Institute, Emam Khomeini hospital during from 1999 to 2001. Then we drew repeated samples of size 50 from these 150 patients. Results: Applying ordinary logistic regression, an appropriate model we fitted to the initial data. Then confidence intervals and standard errors were computed for all regression coefficients. There are many situations where the sample size is insufficient and conditions for using ordinary logistic regression are not met. In these cases the use of the bootstrap method not only produces more accurate estimations of regression coefficients, but with repeated sampling, produces estimates very close to the true values. This holds for the estimation of regression coefficients, confidence intervals and standard errors of coefficients. Conclusion: In this study we show the optimal number of replications and the optimal sample size when using the bootstrap method in studies involving relatively small sample sizes.

  8. A covariance regression model

    OpenAIRE

    Hoff, Peter D.; Niu, Xiaoyue

    2011-01-01

    Classical regression analysis relates the expectation of a response variable to a linear combination of explanatory variables. In this article, we propose a covariance regression model that parameterizes the covariance matrix of a multivariate response vector as a parsimonious quadratic function of explanatory variables. The approach is analogous to the mean regression model, and is similar to a factor analysis model in which the factor loadings depend on the explanatory var...

  9. Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons

    Directory of Open Access Journals (Sweden)

    Lançon Christophe

    2006-07-01

    Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were consistent. Conclusion Fluoxetine was not statistically different in either tolerability or efficacy when compared with duloxetine. Venlafaxine was significantly superior to duloxetine in all analyses except dropout rate. In the absence of relevant data from head-to-head comparison trials, results suggest that venlafaxine is superior compared with duloxetine and that duloxetine does not differentiate from fluoxetine.

  10. Analysis of possible application of regression analysis method in automation of fitting numerical data

    International Nuclear Information System (INIS)

    The report presents an analysis of possible calculation procedure for automated data fitting. The problem is defined in the first part, and it is shown that solution demands optimisation under invariance conditions (stochastic errors) which is part of theory of planning optimal experiments. A rough review of knowledge in this field is given. Second part of the report some statistical and optimisation methods are analysed in more detail in order to be used for automated fitting. Evaluation of possible relevant calculation procedure is presented

  11. Regression Analysis of Effective Factor on People Participation in Protecting, Revitalizing, Developing and Using Renewable Natural Resources in Ilam Province from the View of Users

    Directory of Open Access Journals (Sweden)

    Bagher Arayesh

    2010-01-01

    Full Text Available Problem statement: The purpose of this study was the regression analysis of effective factor on people participation in protecting, revitalizing, developing and using renewable natural resources in Ilam province. Approach: This study was a casual comparative and applies one. Sample was taken from natural resources users. Results: The sample size of groups was 317 for users respectively. For sample selection, stratified, cluster and multiple sampling were utilized. The main tools for gathering data were questionnaire. The reliability and validity of the questionnaire were obtained by experts and pilot study and its Alfa level was 88%. Descriptive and inferential statistics were used and data was analyzed by sp. 15. To test the hypothesis, correlation, multiple regressions were employed. Conclusion: The result indicated that level of education, rate of media using, users trusting on natural resources executive, consulting with users before implementation the plans, number of cattles, kind of occupation, users membership in public institution and organization, social status of users, Technical knowledge of users, present status of natural of natural resources extensive plans, political and low full support of users, amount of loan received by users and organizing nature assistant, have a significant role on people participation on protecting, revitalizing, developing and using renewable natural resources.

  12. Variables that influence HIV-1 cerebrospinal fluid viral load in cryptococcal meningitis: a linear regression analysis

    Directory of Open Access Journals (Sweden)

    Cecchini Diego M

    2009-11-01

    Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.

  13. Changes of platelet GMP-140 in diabetic nephropathy and its multi-factor regression analysis

    International Nuclear Information System (INIS)

    The relation of platelet GMP-140 and its related factors with diabetic nephropathy was studied. 144 patients of diabetic mellitus without nephropathy (group without DN, mean suffering duration of 25.5 +- 18.6 months); 80 with diabetic nephropathy (group DN, mean suffering duration of 58.7 +- 31.6 months) and 50 normal controls were chosen in the research. Platelet GMP-140, plasma ?1-MG, ?2-MG, and 24 hour urine albumin (ALB), IgG, ?1-MG, ?2-MG were detected by RIA, while HBA1C via chromatographic separation and FBG, PBG, Ch, TG, HDL, FG via biochemical methods. All the data had been processed with software on computer with t-test and linear regression, and multi-factor analysis were done also. The levels of platelet GMP-140, FG, DBP, TG, HBA1C and PBG in group DN were significantly higher than those of group without DN and normal control (P 0.05), while they were higher than those of normal controls. Multi-factor analysis of platelet GMP-140 with TG, DBP and HBA1C were performed in 80 patients with DN (P 1C are the independent factors enhancing the activation of platelets. The disturbance of lipid metabolism in type II diabetic mellitus may also enhance the activation of platelets. Elevation of blood pressure may accelerate the initiation and deterioration of DN in which change of platelet GMP-140 is an independent factor. Elevation of HBA1C and blood glucose are related closely to the diabetic nephropathy

  14. The non-condition logistic regression analysis of the reason of hypothyroidism after hyperthyroidism with 131I treatment

    International Nuclear Information System (INIS)

    There are many opinions on the reason of hypothyroidism after hyperthyroidism with 131I treatment. In this respect, there are a few scientific analyses and reports. The non-condition logistic regression solved this problem successfully. It has a higher scientific value and confidence in the risk factor analysis. 748 follow-up patients' data were analysed by the non-condition logistic regression. The results shown that the half-life and 131I dose were the main causes of the incidence of hypothyroidism. The degree of confidence is 92.4%

  15. Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

    OpenAIRE

    Josse, Julie; Chavent, Marie; Liquet, Benoit; Husson, François

    2012-01-01

    A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements. This can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). This algorithm, based ...

  16. Is the Sexual Behaviour of HIV Patients on Antiretroviral therapy safe or risky in Sub-Saharan Africa? Meta-Analysis and Meta-Regression

    Directory of Open Access Journals (Sweden)

    Berhan Asres

    2012-05-01

    Full Text Available Abstract Background Reports on the sexual behavior of people on antiretroviral therapy (ART are inconsistent. We selected 14 articles that compared the sexual behavior of people with and without ART for this analysis. Methods We included both cross-sectional studies that compared different ART-naïve and ART-experienced participants and longitudinal studies examining the behavior of the same individuals pre- and post-ART start. Meta-analyses were performed both stratified by type of study and combined. Outcome variables assessed for association with ART experience were any sexual activity, unprotected sex and having multiple sexual partners. Random-effect models were applied to determine the overall odds ratios. Sub-group analyses and meta-regression analyses were performed to examine sources of heterogeneity among the studies. Sensitivity analysis was also conducted to evaluate the stability of the overall odds ratio in the presence of outliers. Results The meta-analysis failed to show a statistically significant association of any sexual activity with ART experience. It did, however, show an overall statistically significant reduction of any unprotected sex, having multiple sexual partners and unprotected sex with HIV negative or unknown HIV status with ART experience. Meta-regression showed no interaction between duration of ART use or recall period of sexual behavior with the sexual activity variables. However, there was an association between the percentage of married or cohabiting participants included in a study and reductions in the practice of unprotected sex with ART. Conclusion In general, this meta-analysis demonstrated a significant reduction in risky sexual behavior among people on ART in sub-Saharan Africa. Future studies should investigate the reproducibility and continuity of the observed positive behavioural changes as the duration of ART lasts a decade or more.

  17. Trends in Multiple Criteria Decision Analysis

    CERN Document Server

    Ehrgott, Matthias; Greco, Salvatore

    2010-01-01

    Multiple Criteria Decision Making (MCDM) is the study of methods and procedures by which concerns about multiple conflicting criteria can be formally incorporated into the management planning process. A key area of research in OR/MS, MCDM is now being applied in many new areas, including GIS systems, AI, and group decision making. This volume is in effect the third in a series of Springer books by these editors (all in the ISOR series), and it brings all the latest developments in MCDM into focus. Looking at developments in the applications, methodologies and foundations of MCDM, it presents r

  18. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh

    OpenAIRE

    Das Sumonkanti; Rahman Rajwanur M

    2011-01-01

    Abstract Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (?-2.0...

  19. Pascal panretinal laser ablation and regression analysis in proliferative diabetic retinopathy: Manchester Pascal Study Report 4

    OpenAIRE

    Muqit, MM; Marcellino, GR; Henson, DB; Young, LB; Turner, GS; Stanga, PE

    2011-01-01

    AIMS: To quantify the 20-ms Pattern Scan Laser (Pascal) panretinal laser photocoagulation (PRP) ablation dosage required for regression of proliferative diabetic retinopathy (PDR), and to explore factors related to long-term regression. METHODS: We retrospectively studied a cohort of patients who participated in a randomised clinical trial, the Manchester Pascal Study. In all, 36 eyes of 22 patients were investigated over a follow-up period of 18 months. Primary outcome measures included visu...

  20. Adulteration of Argentinean milk fats with animal fats: Detection by fatty acids analysis and multivariate regression techniques.

    Science.gov (United States)

    Rebechi, S R; Vélez, M A; Vaira, S; Perotti, M C

    2016-02-01

    The aims of the present study were to test the accuracy of the fatty acid ratios established by the Argentinean Legislation to detect adulterations of milk fat with animal fats and to propose a regression model suitable to evaluate these adulterations. For this purpose, 70 milk fat, 10 tallow and 7 lard fat samples were collected and analyzed by gas chromatography. Data was utilized to simulate arithmetically adulterated milk fat samples at 0%, 2%, 5%, 10% and 15%, for both animal fats. The fatty acids ratios failed to distinguish adulterated milk fats containing less than 15% of tallow or lard. For each adulterant, Multiple Linear Regression (MLR) was applied, and a model was chosen and validated. For that, calibration and validation matrices were constructed employing genuine and adulterated milk fat samples. The models were able to detect adulterations of milk fat at levels greater than 10% for tallow and 5% for lard. PMID:26304443

  1. Contrastive analysis of multiple exciton generation theories

    Science.gov (United States)

    Tan, Hengyu; Chang, Qing

    2015-10-01

    Multiple exciton generation (MEG) is an effect that semiconductor nanocrystals (NCs) quantum dots (QDs) generate multiple excitons (electron-hole pairs) through absorbing a single high energy photon. It can translate the excess photon energy of bandgap (Eg) into new excitons instead of heat loss and improve the photovoltaic performance of solar cells. However, the theories of MEG are not uniform. The main MEG theories can be divided into three types. The first is impact ionization. It explains MEG through a conventional way that a photogenerated exciton becomes multiple excitons by Coulomb interactions between carriers. The Second is coherent superposition of excitonic states. Multiple excitons are generated by the coherent superposition of single photogenerated exciton state with enough excess momentum and the two-exciton state with the same momentum. The third is excitation via virtual excitonic states. The nanocrystals vacuum generates a virtual biexciton by coulomb coupling between two valence band electrons. The virtual biexciton absorbing a photon with an intraband optical transition is converted into a real biexciton. This paper describes the MEG influence on solar photoelectric conversion efficiency, concludes and analyzes the fundamentals of different MEG theories, the MEG experimental measure, their merits and demerits, calculation methods of generation efficiency.

  2. Extremal quantile regression

    CERN Document Server

    Chernozhukov, V

    2005-01-01

    Quantile regression is an important tool for estimation of conditional quantiles of a response Y given a vector of covariates X. It can be used to measure the effect of covariates not only in the center of a distribution, but also in the upper and lower tails. This paper develops a theory of quantile regression in the tails. Specifically, it obtains the large sample properties of extremal (extreme order and intermediate order) quantile regression estimators for the linear quantile regression model with the tails restricted to the domain of minimum attraction and closed under tail equivalence across regressor values. This modeling setup combines restrictions of extreme value theory with leading homoscedastic and heteroscedastic linear specifications of regression analysis. In large samples, extreme order regression quantiles converge weakly to \\argmin functionals of stochastic integrals of Poisson processes that depend on regressors, while intermediate regression quantiles and their functionals converge to nor...

  3. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  4. A flexible mixed-effect negative binomial regression model for detecting unusual increases in MRI lesion counts in individual multiple sclerosis patients.

    Science.gov (United States)

    Kondo, Yumi; Zhao, Yinshan; Petkau, John

    2015-06-15

    We develop a new modeling approach to enhance a recently proposed method to detect increases of contrast-enhancing lesions (CELs) on repeated magnetic resonance imaging, which have been used as an indicator for potential adverse events in multiple sclerosis clinical trials. The method signals patients with unusual increases in CEL activity by estimating the probability of observing CEL counts as large as those observed on a patient's recent scans conditional on the patient's CEL counts on previous scans. This conditional probability index (CPI), computed based on a mixed-effect negative binomial regression model, can vary substantially depending on the choice of distribution for the patient-specific random effects. Therefore, we relax this parametric assumption to model the random effects with an infinite mixture of beta distributions, using the Dirichlet process, which effectively allows any form of distribution. To our knowledge, no previous literature considers a mixed-effect regression for longitudinal count variables where the random effect is modeled with a Dirichlet process mixture. As our inference is in the Bayesian framework, we adopt a meta-analytic approach to develop an informative prior based on previous clinical trials. This is particularly helpful at the early stages of trials when less data are available. Our enhanced method is illustrated with CEL data from 10 previous multiple sclerosis clinical trials. Our simulation study shows that our procedure estimates the CPI more accurately than parametric alternatives when the patient-specific random effect distribution is misspecified and that an informative prior improves the accuracy of the CPI estimates. PMID:25784219

  5. Window Regression: A Spatial-Temporal Analysis to Estimate Pixels Classified as Low-Quality in MODIS NDVI Time Series

    Directory of Open Access Journals (Sweden)

    Julio Cesar de Oliveira

    2014-04-01

    Full Text Available MODerate resolution Imaging Spectroradiometer (MODIS data are largely used in multitemporal analysis of various Earth-related phenomena, such as vegetation phenology, land use/land cover change, deforestation monitoring, and time series analysis. In general, the MODIS products used to undertake multitemporal analysis are composite mosaics of the best pixels over a certain period of time. However, it is common to find bad pixels in the composition that affect the time series analysis. We present a filtering methodology that considers the pixel position (location in space and time (position in the temporal data series to define a new value for the bad pixel. This methodology, called Window Regression (WR, estimates the value of the point of interest, based on the regression analysis of the data selected by a spatial-temporal window. The spatial window is represented by eight pixels neighboring the pixel under evaluation, and the temporal window selects a set of dates close to the date of interest (either earlier or later. Intensities of noises were simulated over time and space, using the MOD13Q1 product. The method presented and other techniques (4253H twice, Mean Value Iteration (MVI and Savitzky–Golay were evaluated using the Mean Absolute Percentage Error (MAPE and Akaike Information Criteria (AIC. The tests revealed the consistently superior performance of the Window Regression approach to estimate new Normalized Difference Vegetation Index (NDVI values irrespective of the intensity of the noise simulated.

  6. Approaches to data analysis of multiple-choice questions

    OpenAIRE

    Lin Ding; Robert Beichner

    2009-01-01

    This paper introduces five commonly used approaches to analyzing multiple-choice test data. They are classical test theory, factor analysis, cluster analysis, item response theory, and model analysis. Brief descriptions of the goals and algorithms of these approaches are provided, together with examples illustrating their applications in physics education research. We minimize mathematics, instead placing emphasis on data interpretation using these approaches.

  7. Subchannel analysis of multiple CHF events

    International Nuclear Information System (INIS)

    The phenomenon of multiple CHF events in rod bundle heat transfer tests, referring to the occurrence of CHF on more than one rod or at more than one location on one rod is examined. The adequacy of some of the subchannel CHF correlations presently used in the nuclear industry in predicting higher order CHF events is ascertained based on local coolant conditions obtained with the COBRA IIIC subchannel code. The rod bundle CHF data obtained at the Heat Transfer Research Facility of Columbia University are examined for multiple CHF events using a combination of statistical analyses and parametric studies. The above analyses are applied to the study of three data sets of tests simulating both PWR and BWR reactor cores with uniform and non-uniform axial heat flux distributions. The CHF correlations employed in this study include: (1) CE-1 correlation, (2) B and W-2 correlation, (3) W-3 correlation, and (4) Columbia correlation

  8. Transferencia regional de información hidrológica mediante regresión lineal múltiple de tipo ridge / Regional transference of hydrologic information through multiple linear regression of ridge type

    Scientific Electronic Library Online (English)

    Daniel F., Campos-Aranda.

    2013-08-01

    Full Text Available Cuando se emplean registros largos de escurrimiento, lluvia o crecientes anuales de una región con respuesta hidrológica similar, para ampliar una serie corta a través de la técnica estadística de regresión lineal múltiple (RLM), es probable que tales registros por su semejanza intrínseca den origen [...] a un problema de multicolinealidad. Tal problema se debe detectar y cuantificar para saber si es aceptable, moderada, fuerte o grave y buscar soluciones alternativas al método de ajuste de la RLM por mínimos cuadrados de los residuos. En este estudio se diagnosticó la multicolinealidad mediante factores de inflación de la variancia e índices de condición, basados en los eigenvalores. Además se presenta como método alternativo el ajuste sesgado de la RLM, conocido como regresión Ridge. Una aplicación numérica en el sistema del río Tempoal, de la Región Hidrológica No. 26 (Pánuco, México), se describió para completar el registro corto de volúmenes escurridos anuales de la estación hidrométrica Platón Sánchez, con base en las otras cuatro estaciones de aforos que cuentan con registros amplios. Se concluye que las principales ventajas de la regresión Ridge son la facilidad de manejo de transferencia con seis o más regresores y la sencillez de su implementación y desarrollo a través de la traza Ridge. Abstract in english When annual long records are used of runoff, rainfall or flooding of a region with similar hydrological response, to amplify short series through the statistical technique of multiple linear regression (MLR), it is likely that those records by reason of their intrinsic similarity will lead to a prob [...] lem of multicollinearity. This problem should be detected and quantified to know if it is acceptable, moderate, strong or serious and look for alternative solutions to the fitting method of the MLR by least squares of the residuals. In this study a diagnostic was made of multicollinearity through variance inflation factors and condition indices based on the eigenvalues. In addition, the biased fitting of the MLR is presented as an alternative method, known as Ridge regression. A numerical application in the system of the Tempoal river, of Hydrological Region No. 26 (Pánuco, México), was described to complete the short record of runoff volumes of the Platón Sánchez hydrometric station, based on the other four measuring stations that have long records. It is concluded that the principal advantages of Ridge regression are the ease of handling of transference with six or more regressions and the simplicity of its implementation and development by means of the Ridge trace.

  9. Significant drivers of the virtual water trade evaluated with a multivariate regression analysis

    Science.gov (United States)

    Tamea, Stefania; Laio, Francesco; Ridolfi, Luca

    2014-05-01

    International trade of food is vital for the food security of many countries, which rely on trade to compensate for an agricultural production insufficient to feed the population. At the same time, food trade has implications on the distribution and use of water resources, because through the international trade of food commodities, countries virtually displace the water used for food production, known as "virtual water". Trade thus implies a network of virtual water fluxes from exporting to importing countries, which has been estimated to displace more than 2 billions of m3 of water per year, or about the 2% of the annual global precipitation above land. It is thus important to adequately identify the dynamics and the controlling factors of the virtual water trade in that it supports and enables the world food security. Using the FAOSTAT database of international trade and the virtual water content available from the Water Footprint Network, we reconstructed 25 years (1986-2010) of virtual water fluxes. We then analyzed the dependence of exchanged fluxes on a set of major relevant factors, that includes: population, gross domestic product, arable land, virtual water embedded in agricultural production and dietary consumption, and geographical distance between countries. Significant drivers have been identified by means of a multivariate regression analysis, applied separately to the export and import fluxes of each country; temporal trends are outlined and the relative importance of drivers is assessed by a commonality analysis. Results indicate that population, gross domestic product and geographical distance are the major drivers of virtual water fluxes, with a minor (but non-negligible) contribution given by the agricultural production of exporting countries. Such drivers have become relevant for an increasing number of countries throughout the years, with an increasing variance explained by the distance between countries and a decreasing role of the gross domestic product. The worldwide adjusted coefficient of determination of fitted gravity-law model is 0.57 (in 2010), and it has increased in time, confirming the good descriptive capability of selected drivers for the virtual water trade.

  10. Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27

    Directory of Open Access Journals (Sweden)

    Dagmar Blatná

    2014-06-01

    Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.

  11. Solar radiation analysis and regression coefficients for the Vhembe Region, Limpopo Province, South Africa

    Scientific Electronic Library Online (English)

    Sophie T, Mulaudzi; Vaithianathaswami, Sankaran; Meena D, Lysko.

    Full Text Available Given the limited observed and reliable data for solar irradiance in rural parts in South Africa, a correlation equation of the Angström-Prescott linear type has been used to estimate the regression coefficients in the Vhembe District, Limpopo Province, South Africa. Five stations were selected for [...] the study, with the greatest distance between stations less than 180 km. Monthly regression coefficients were derived for each station based on an observation dataset of sunshine duration hours and global horizontal irradiance. The correlation coefficients appear to be above 0.9. The representative Angström-Prescott model for the Vhembe Region was found by collating the data for each station and then averaging the respective correlation coefficients. This paper presents the generated regression coefficients for each station and for the Vhembe Region.

  12. REGRESSION ANALYSIS OF LINEAR BODY MEASUREMENTS ON LIVE WEIGHT IN SUDANESE SHUGOR SHEEP

    Directory of Open Access Journals (Sweden)

    A.M. MUSA

    2012-02-01

    Full Text Available In this research, linear regression models were improved for estimation of body weight using various linear body measurements from Sudanese Shugor sheep. Simple regression models were formed when Body weight (Bwt was dependent variable and heart girth (HG, height at withers (HTW and height at hip (HTH as independent variables. The best derived regression prediction equation for estimation of body weight determinated by using beta (? as the constant based on number of variables used for the equation, mean square error (MSE and Coefficient of determination (R2. The model including the most appropriate measurements such as heart girth, height at wither and height at hip were the best fitted model (? = -47.54, MSE = 9.39 and R2=0.61 for estimation of body weight in Sudanese Shugor sheep in this study.

  13. The Determination of Polyethlylene Glycol and Water in Archaeological Wood using Infrared Spectroscopy and Stepwise Multiple Linear Regression

    OpenAIRE

    Patel, Rohan; Jessica BINGHAM; Shanna DANIEL; Sarah WATKINS-KENNEY; Anthony KENNEDY

    2012-01-01

    Polyethylene glycol (PEG) is the most common preservative in use for bulking and maintaining structural integrity in waterlogged wood. Conservators therefore have a need to be able to determine PEG concentrations in wood in a non-destructive manner. We present a study highlighting the application of infrared spectroscopy coupled with multivariate analysis techniques to predict the concentration of polyethylene glycol 400 (PEG-400) and water simultaneously. This technique uses attenuated total...

  14. Partial covariate adjusted regression

    OpenAIRE

    ?entürk, Damla; Nguyen, Danh V

    2009-01-01

    Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (?entürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regressio...

  15. JT-60 configuration parameters for feedback control determined by regression analysis

    International Nuclear Information System (INIS)

    The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60. (author)

  16. Generalized linear regression analysis of association of universal helmet laws with motorcyclist fatality rates.

    Science.gov (United States)

    Morris, C Craig

    2006-01-01

    This study evaluates the association of universal helmet laws with U.S. motorcyclist fatality rates from 1993 through 2002 using climate measures as statistical controls for motorcycling activity via quasi-maximum likelihood generalized linear regression analyses. Results revealed that motorcyclist fatalities and injuries are strongly associated with normalized heating degree days and precipitation inches, and that universal helmet laws are associated with lower motorcyclist fatality rates when these climate measures, and their interaction, are statistically controlled. This study shows that climate measures have considerable promise as indirect measures (proxies) of motorcycling activity in generalized linear regression studies. PMID:16202466

  17. Study on traffic noise level of Sylhet by multiple regression analysis associated with health hazards

    Directory of Open Access Journals (Sweden)

    J. B. Alam, M. Jobair Bin Alam, M. M. Rahman, A. K. Dikshit, S. K. Khan

    Full Text Available The study reports the level of traffic-induced noise pollution in Sylhet City. For this purpose noise levels have been measured at thirty-seven major locations of the city from 7 am to 11 pm during the working days. It was observed that at all the locations the level of noise remains far above the acceptable limit for all the time. The noise level on the main road near residential area, hospital area and educational area were above the recommended level (65dBA. It was found that the predictive equations are in 60-70% correlated with the measured noise level. The study suggests that vulnerable institutions like school and hospital should be located about 60m away from the roadside unless any special arrangement to alleviate sound is used.

  18. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    OpenAIRE

    Lin, DongDong; Zhang, Jigang; Li, Jingyao; He, Hao; Deng, Hong-Wen; Wang, Yu-Ping

    2014-01-01

    A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely spa...

  19. Gaussian Process Regression Networks

    OpenAIRE

    Wilson, Andrew Gordon; Knowles, David A.; Ghahramani, Zoubin

    2011-01-01

    We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bay...

  20. Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Chau-Kuang Chen

    2015-02-01

    Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.

  1. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-01-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system

  2. Spontaneous regression of osteochondromas

    Energy Technology Data Exchange (ETDEWEB)

    Hoshi, Manabu; Takami, Masatsugu; Hashimoto, Ryouji; Okamoto, Takashi; Yanagida, Ikuhisa; Matsumura, Akira; Noguchi, Kazuko [Yodogawa Christian Hospital, Department of Orthopaedic Surgery, Osaka (Japan)

    2007-06-15

    Spontaneous regression of an osteochondroma is an infrequent event. In this report, two cases with spontaneous regression of osteochondromas are presented. The first case was a solitary osteochondroma of the pedunculated type involving the right proximal humerus in a 7-year-old boy. This lesion resolved over 15 months of observation. The second case was a 3-year-old girl with multiple osteochondromatosis, in whom sessile osteochondromas of the right tibia and left fibula regressed over 33 months.The mechanism of this phenomenon is discussed with a review of previous reports. Regarding treatment, careful observation may be acceptable for typical osteochondromas, especially in young children. (orig.)

  3. The Analysis of Nonstationary Time Series Using Regression, Correlation and Cointegration

    Directory of Open Access Journals (Sweden)

    Søren Johansen

    2012-06-01

    Full Text Available There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally we analyse some monthly data from US on interest rates as an illustration of the methods.

  4. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  5. The analysis of nonstationary time series using regression, correlation and cointegration

    DEFF Research Database (Denmark)

    Johansen, SØren

    2012-01-01

    There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally we analyse some monthly data from US on interest rates as an illustration of the methods

  6. Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

    Science.gov (United States)

    Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

    2010-01-01

    In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…

  7. An application of nonparametric Cox regression model in reliability analysis: A case study.

    Czech Academy of Sciences Publication Activity Database

    Volf, Petr

    2004-01-01

    Ro?. 40, ?. 5 (2004), s. 639-648. ISSN 0023-5954 R&D Projects: GA ?R GA201/02/0049; GA ?R GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : hazard rate * nonparametric regression * Cox model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

  8. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  9. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  10. Selection on plasticity of seasonal life-history traits using random regression mixed model analysis.

    Science.gov (United States)

    Brommer, Jon E; Kontiainen, Pekka; Pietiäinen, Hannu

    2012-04-01

    Theory considers the covariation of seasonal life-history traits as an optimal reaction norm, implying that deviating from this reaction norm reduces fitness. However, the estimation of reaction-norm properties (i.e., elevation, linear slope, and higher order slope terms) and the selection on these is statistically challenging. We here advocate the use of random regression mixed models to estimate reaction-norm properties and the use of bivariate random regression to estimate selection on these properties within a single model. We illustrate the approach by random regression mixed models on 1115 observations of clutch sizes and laying dates of 361 female Ural owl Strix uralensis collected over 31 years to show that (1) there is variation across individuals in the slope of their clutch size-laying date relationship, and that (2) there is selection on the slope of the reaction norm between these two traits. Hence, natural selection potentially drives the negative covariance in clutch size and laying date in this species. The random-regression approach is hampered by inability to estimate nonlinear selection, but avoids a number of disadvantages (stats-on-stats, connecting reaction-norm properties to fitness). The approach is of value in describing and studying selection on behavioral reaction norms (behavioral syndromes) or life-history reaction norms. The approach can also be extended to consider the genetic underpinning of reaction-norm properties. PMID:22837818

  11. A Gauss-Newton-Based Broyden’s Class Algorithm for Parameters of Regression Analysis

    OpenAIRE

    Xiangrong Li; Xupei Zhao

    2011-01-01

    In this paper, a Gauss-Newton-based Broyden’s class method for parameters of regression problems is presented. The global convergence of this given method will be established under suitable conditions. Numerical results show that the proposed method is interesting.

  12. Atypical antipsychotics in the treatment of schizophrenia: systematic overview and meta-regression analysis

    OpenAIRE

    Geddes, J; Freemantle, N.; P. Harrison; Bebbington, P

    2000-01-01

    OBJECTIVE: To develop an evidence base for recommendations on the use of atypical antipsychotics for patients with schizophrenia. DESIGN: Systematic overview and meta-regression analyses of randomised controlled trials, as a basis for formal development of guidelines. SUBJECTS: 12 649 patients in 52 randomised trials comparing atypical antipsychotics (amisulpride, clozapine, olanzapine, quetiapine, risperidone, and sertindole) with conventional antipsychotics (usually haloperidol or chlorprom...

  13. Rotation and Noise Invariant Near-Infrared Face Recognition by means of Zernike Moments and Spectral Regression Discriminant Analysis.

    Czech Academy of Sciences Publication Activity Database

    Farokhi, S.; Shamsuddin, S. M.; Flusser, Jan; Sheikh, U. U.; Khansari, M.; Jafari-Khouzani, K.

    2013-01-01

    Ro?. 22, ?. 1 (2013), s. 1-11. ISSN 1017-9909 R&D Projects: GA ?R GAP103/11/1552 Keywords : face recognition * infrared imaging * image moments Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.850, year: 2013 http://library.utia.cas.cz/separaty/2013/ZOI/flusser-rotation and noise invariant near-infrared face recognition by means of zernike moments and spectral regression discriminant analysis.pdf

  14. A Hybrid Sales Forecasting Scheme by Combining Independent Component Analysis with K-Means Clustering and Support Vector Regression

    OpenAIRE

    Chi-Jie Lu; Chi-Chang Chang

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ...

  15. Brief psychological therapies for anxiety and depression in primary care: meta-analysis and meta-regression

    OpenAIRE

    Cape John; Whittington Craig; Buszewicz Marta; Wallace Paul; Underwood Lisa

    2010-01-01

    Abstract Background Psychological therapies provided in primary care are usually briefer than in secondary care. There has been no recent comprehensive review comparing their effectiveness for common mental health problems. We aimed to compare the effectiveness of different types of brief psychological therapy administered within primary care across and between anxiety, depressive and mixed disorders. Methods Meta-analysis and meta-regression of randomized controlled trials of brief psycholog...

  16. A synthesis of the effects of exchange rate uncertainty on international trade via Meta-Regression analysis

    OpenAIRE

    bouoiyour, jamal; Selmi, Refk

    2015-01-01

    The main focus of this paper is to survey the literature that investigates the effects of exchange rate uncertainty on international trade. Specifically, we carry out meta-regression analysis to 42 studies with 810 estimates. We show that the empirical studies on the focal link exhibit a substantial publication selection and a significant genuine exchange rate volatility effect on trade flows after correction of publication bias. Moreover, we find that most of the variables that may help ex...

  17. Robust Principal Component Analysis and Geographically Weighted Regression: Urbanization in the Twin Cities Metropolitan Area of Minnesota

    OpenAIRE

    Ghosh, Debarchana; Manson, Steven M.

    2008-01-01

    In this paper, we present a hybrid approach, robust principal component geographically weighted regression (RPCGWR), in examining urbanization as a function of both extant urban land use and the effect of social and environmental factors in the Twin Cities Metropolitan Area (TCMA) of Minnesota. We used remotely sensed data to treat urbanization via the proxy of impervious surface. We then integrated two different methods, robust principal component analysis (RPCA) and geographically weighted ...

  18. A new monochromator with multiple offset cylindrical lenses 2: Aberration analysis and its applications

    Science.gov (United States)

    Ogawa, Takashi; Cho, Boklae

    2015-11-01

    In this article, we continue our investigation and offer a complementary discussion of our newly proposed monochromator (MC). It consists of multiple offset cylindrical lenses and achieves high performance with a simple structure. We simulate beam profiles in an extensive current range by means of a ray trace method. Through a multiple regression analysis, we derive the aberrations of the MC up to the third rank. The second-order aperture aberration and lateral energy dispersion are canceled on the exit image plane, which is a crucial condition when MCs are applied to electron microscopes. These aberrations enable the interpretation of the beam profiles for various currents and energy deviations. In addition, they provide the dependencies pertaining to the MC performance, such as the energy spread and brightness, of beam currents for various source conditions. This information is essential to implement the MC onto an electron microscope. By improved the spatial resolutions and energy resolutions, the microscope can reveal new information about various specimens. In addition, the simple and robust structure of the MC will satisfy the demand from industry. Additionally, this study contributes to charged particle optics theory in that it presents a practical example of aberration computation, of which the optics is too complicated to establish aberration integrals, through the ray trace method and regression analysis.

  19. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2005-01-01

    Master linear regression techniques with a new edition of a classic text Reviews of the Second Edition: ""I found it enjoyable reading and so full of interesting material that even the well-informed reader will probably find something new . . . a necessity for all of those who do linear regression."" -Technometrics, February 1987 ""Overall, I feel that the book is a valuable addition to the now considerable list of texts on applied linear regression. It should be a strong contender as the leading text for a first serious course in regression analysis."" -American Scientist, May-June 1987

  20. Content Analysis of Turkish Studies about the Multiple Intelligences Theory

    Science.gov (United States)

    Saban, Ahmet

    2009-01-01

    Recently, there has been a significant increase in the number of multiple intelligences (MI) studies in Turkey. Consequently, a systematic analysis of these studies is crucial in order to be able to see the present situation and future trends in the field of education. By this way, it is also hoped that the current analysis will offer an avenue…