Controlling the Type I Error Rate in Stepwise Regression Analysis.
Pohlmann, John T.
Three procedures used to control Type I error rate in stepwise regression analysis are forward selection, backward elimination, and true stepwise. In the forward selection method, a model of the dependent variable is formed by choosing the single best predictor; then the second predictor which makes the strongest contribution to the prediction of…
Freund, Rudolf J; Sa, Ping
2006-01-01
The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design
Regression analysis by example
Chatterjee, Samprit; Hadi, Ali S
2012-01-01
.... The emphasis continues to be on exploratory data analysis rather than statistical theory. The coverage offers in-depth treatment of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression...
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
LI; XinTian; TIAN; Hui; CAI; GuoBiao
2013-01-01
This paper presents three-dimensional numerical simulations of the hybrid rocket motor with hydrogen peroxide (HP) and hy-droxyl terminated polybutadiene (HTPB) propellant combination and investigates the fuel regression rate distribution charac-teristics of different fuel types. The numerical models are established to couple the Navier-Stokes equations with turbulence,chemical reactions, solid fuel pyrolysis and solid-gas interfacial boundary conditions. Simulation results including the temper-ature contours and fuel regression rate distributions are presented for the tube, star and wagon wheel grains. The results demonstrate that the changing trends of the regression rate along the axis are similar for all kinds of fuel types, which decrease sharply near the leading edges of the fuels and then gradually increase with increasing axial locations. The regression rates of the star and wagon wheel grains show apparent three-dimensional characteristics, and they are higher in the regions of fuel surfaces near the central core oxidizer flow. The average regression rates increase as the oxidizer mass fluxes rise for all of the fuel types. However, under same oxidizer mass flux, the average regression rates of the star and wagon wheel grains are much larger than that of the tube grain due to their lower hydraulic diameters.
Saadah, Nicholas H; van Hout, Fabienne M A; Schipperus, Martin R; le Cessie, Saskia; Middelburg, Rutger A; Wiersum-Osselton, Johanna C; van der Bom, Johanna G
2017-09-01
We estimated rates for common plasma-associated transfusion reactions and compared reported rates for various plasma types. We performed a systematic review and meta-analysis of peer-reviewed articles that reported plasma transfusion reaction rates. Random-effects pooled rates were calculated and compared between plasma types. Meta-regression was used to compare various plasma types with regard to their reported plasma transfusion reaction rates. Forty-eight studies reported transfusion reaction rates for fresh-frozen plasma (FFP; mixed-sex and male-only), amotosalen INTERCEPT FFP, methylene blue-treated FFP, and solvent/detergent-treated pooled plasma. Random-effects pooled average rates for FFP were: allergic reactions, 92/10(5) units transfused (95% confidence interval [CI], 46-184/10(5) units transfused); febrile nonhemolytic transfusion reactions (FNHTRs), 12/10(5) units transfused (95% CI, 7-22/10(5) units transfused); transfusion-associated circulatory overload (TACO), 6/10(5) units transfused (95% CI, 1-30/10(5) units transfused); transfusion-related acute lung injury (TRALI), 1.8/10(5) units transfused (95% CI, 1.2-2.7/10(5) units transfused); and anaphylactic reactions, 0.8/10(5) units transfused (95% CI, 0-45.7/10(5) units transfused). Risk differences between plasma types were not significant for allergic reactions, TACO, or anaphylactic reactions. Methylene blue-treated FFP led to fewer FNHTRs than FFP (risk difference = -15.3 FNHTRs/10(5) units transfused; 95% CI, -24.7 to -7.1 reactions/10(5) units transfused); and male-only FFP led to fewer cases of TRALI than mixed-sex FFP (risk difference = -0.74 TRALI/10(5) units transfused; 95% CI, -2.42 to -0.42 injuries/10(5) units transfused). Meta-regression demonstrates that the rate of FNHTRs is lower for methylene blue-treated compared with FFP, and the rate of TRALI is lower for male-only than for mixed-sex FFP; whereas no significant differences are observed between plasma types for allergic
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Mehmet Arslan
2010-06-01
Full Text Available The objective of the study is to evaluate risk- reward relationship and relative performances of the 4 different groups of mutual funds. To this end, daily return data of these 12 mutual funds (3 type variable fund; 3 B type variable fund; 3 A type stock fund and 3 A type Exchange traded fund together with daily market index (imkb100 return and daily return of riskless rate for the period from January 2006 to Feb 2010. The 180-day maturity T-Bill has been selected to represent riskless rate. To determine performances of mutual funds; Sharpe ratio, M2 measure, Treynor index, Jensen index, Sortino ratio, T2 ratio, Valuation ratio has been applied and these indicators produced conflicting results in ranking mutual funds. Then timingand selection capability of the fund manager has been determined by applying simple regression and Quadratic regression. Interestingly all funds found to have positive coefficient, indicating positive election capability of managers; but in terms of timing capability only one fund managers showed success. Finally, to determine extent to which mean returns are differs between mutual funds, market index (imkb100 and riskless rate (180 day TBill results of the analysis revealed that mean returns of individual security returns differs at P≤0,01 level. That shows instability in returns and poor ex-ante forecast modeling capability.
Heteroscedastic regression analysis method for mixed data
FU Hui-min; YUE Xiao-rui
2011-01-01
The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Common pitfalls in statistical analysis: Logistic regression.
Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh
2017-01-01
Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). In this article, we discuss logistic regression analysis and the limitations of this technique.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Regression Analysis by Example. 5th Edition
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Tomita, H; Kobayashi, Y; Minezaki, T; Enya, K; Suganuma, M; Aoki, T; Koshida, S; Yamauchi, M; Tomita, Hiroyuki; Yoshii, Yuzuru; Kobayashi, Yukiyasu; Minezaki, Takeo; Enya, Keigo; Suganuma, Masahiro; Aoki, Tsutomu; Koshida, Shintaro; Yamauchi, Masahiro
2006-01-01
We propose a new method of analysing a variable component for type 1 active galactic nuclei (AGNs) in the near-infrared wavelength region. This analysis uses a multiple regression technique and divides the variable component into two components originating in the accretion disk at the center of AGNs and from the dust torus that far surrounds the disk. Applying this analysis to the long-term $VHK$ monitoring data of MCG+08-11-011 that were obtained by the MAGNUM project, we found that the $(H-K)$-color temperature of the dust component is $T = 1635$K $\\pm20$K, which agrees with the sublimation temperature of dust grains, and that the time delay of $K$ to $H$ variations is $\\Delta t\\approx 6$ days, which indicates the existence of a radial temperature gradient in the dust torus. As for the disk component, we found that the power-law spectrum of $f_\
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
Using Regression Mixture Analysis in Educational Research
Cody S. Ding
2006-11-01
Full Text Available Conventional regression analysis is typically used in educational research. Usually such an analysis implicitly assumes that a common set of regression parameter estimates captures the population characteristics represented in the sample. In some situations, however, this implicit assumption may not be realistic, and the sample may contain several subpopulations such as high math achievers and low math achievers. In these cases, conventional regression models may provide biased estimates since the parameter estimates are constrained to be the same across subpopulations. This paper advocates the applications of regression mixture models, also known as latent class regression analysis, in educational research. Regression mixture analysis is more flexible than conventional regression analysis in that latent classes in the data can be identified and regression parameter estimates can vary within each latent class. An illustration of regression mixture analysis is provided based on a dataset of authentic data. The strengths and limitations of the regression mixture models are discussed in the context of educational research.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Relative risk regression analysis of epidemiologic data.
Prentice, R L
1985-11-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation
Spinocerebellar ataxia type 2 presenting with cognitive regression in childhood.
Ramocki, Melissa B; Chapieski, Lynn; McDonald, Ryan O; Fernandez, Fabio; Malphrus, Amy D
2008-09-01
Spinocerebellar ataxia type 2 typically presents in adulthood with progressive ataxia, dysarthria, tremor, and slow saccadic eye movements. Childhood-onset spinocerebellar ataxia type 2 is rare, and only the infantile-onset form has been well characterized clinically. This article describes a girl who met all developmental milestones until age 3(1/2) years, when she experienced cognitive regression that preceded motor regression by 6 months. A diagnosis of spinocerebellar ataxia type 2 was delayed until she presented to the emergency department at age 7 years. This report documents the results of her neuropsychologic evaluation at both time points. This case broadens the spectrum of spinocerebellar ataxia type 2 presentation in childhood, highlights the importance of considering a spinocerebellar ataxia in a child who presents with cognitive regression only, and extends currently available clinical information to help clinicians discuss the prognosis in childhood spinocerebellar ataxia type 2.
Xin Fang
2016-11-01
Full Text Available The epidemiological evidence for a dose-response relationship between magnesium intake and risk of type 2 diabetes mellitus (T2D is sparse. The aim of the study was to summarize the evidence for the association of dietary magnesium intake with risk of T2D and evaluate the dose-response relationship. We conducted a systematic review and meta-analysis of prospective cohort studies that reported dietary magnesium intake and risk of incident T2D. We identified relevant studies by searching major scientific literature databases and grey literature resources from their inception to February 2016. We included cohort studies that provided risk ratios, i.e., relative risks (RRs, odds ratios (ORs or hazard ratios (HRs, for T2D. Linear dose-response relationships were assessed using random-effects meta-regression. Potential nonlinear associations were evaluated using restricted cubic splines. A total of 25 studies met the eligibility criteria. These studies comprised 637,922 individuals including 26,828 with a T2D diagnosis. Compared with the lowest magnesium consumption group in the population, the risk of T2D was reduced by 17% across all the studies; 19% in women and 16% in men. A statistically significant linear dose-response relationship was found between incremental magnesium intake and T2D risk. After adjusting for age and body mass index, the risk of T2D incidence was reduced by 8%–13% for per 100 mg/day increment in dietary magnesium intake. There was no evidence to support a nonlinear dose-response relationship between dietary magnesium intake and T2D risk. The combined data supports a role for magnesium in reducing risk of T2D, with a statistically significant linear dose-response pattern within the reference dose range of dietary intake among Asian and US populations. The evidence from Europe and black people is limited and more prospective studies are needed for the two subgroups.
Kauhl, Boris; Pieper, Jonas; Schweikart, Jürgen; Keste, Andrea; Moskwyn, Marita
2017-02-16
Understanding which population groups in which locations are at higher risk for type 2 diabetes mellitus (T2DM) allows efficient and cost-effective interventions targeting these risk-populations in great need in specific locations. The goal of this study was to analyze the spatial distribution of T2DM and to identify the location-specific, population-based risk factors using global and local spatial regression models. To display the spatial heterogeneity of T2DM, bivariate kernel density estimation was applied. An ordinary least squares regression model (OLS) was applied to identify population-based risk factors of T2DM. A geographically weighted regression model (GWR) was then constructed to analyze the spatially varying association between the identified risk factors and T2DM. T2DM is especially concentrated in the east and outskirts of Berlin. The OLS model identified proportions of persons aged 80 and older, persons without migration background, long-term unemployment, households with children and a negative association with single-parenting households as socio-demographic risk groups. The results of the GWR model point out important local variations of the strength of association between the identified risk factors and T2DM. The risk factors for T2DM depend largely on the socio-demographic composition of the neighborhoods in Berlin and highlight that a one-size-fits-all approach is not appropriate for the prevention of T2DM. Future prevention strategies should be tailored to target location-specific risk-groups. © Georg Thieme Verlag KG Stuttgart · New York.
Robust Mediation Analysis Based on Median Regression
Yuan, Ying; MacKinnon, David P.
2014-01-01
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
2012-11-01
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.
Katakami, Naoto; Shiraiwa, Toshihiko; Yoshii, Hidenori; Gosho, Masahiko; Shimomura, Iichiro; Watada, Hirotaka
2017-01-01
Background. The effect of dipeptidyl peptidase-4 (DPP-4) inhibitors on the regression of carotid IMT remains largely unknown. The present study aimed to clarify whether sitagliptin, DPP-4 inhibitor, could regress carotid intima-media thickness (IMT) in insulin-treated patients with type 2 diabetes mellitus (T2DM). Methods. This is an exploratory analysis of a randomized trial in which we investigated the effect of sitagliptin on the progression of carotid IMT in insulin-treated patients with T2DM. Here, we compared the efficacy of sitagliptin treatment on the number of patients who showed regression of carotid IMT of ≥0.10 mm in a post hoc analysis. Results. The percentages of the number of the patients who showed regression of mean-IMT-CCA (28.9% in the sitagliptin group versus 16.4% in the conventional group, P = 0.022) and left max-IMT-CCA (43.0% in the sitagliptin group versus 26.2% in the conventional group, P = 0.007), but not right max-IMT-CCA, were higher in the sitagliptin treatment group compared with those in the non-DPP-4 inhibitor treatment group. In multiple logistic regression analysis, sitagliptin treatment significantly achieved higher target attainment of mean-IMT-CCA ≥0.10 mm and right and left max-IMT-CCA ≥0.10 mm compared to conventional treatment. Conclusions. Our data suggested that DPP-4 inhibitors were associated with the regression of carotid atherosclerosis in insulin-treated T2DM patients. This study has been registered with the University Hospital Medical Information Network Clinical Trials Registry (UMIN000007396). PMID:28250768
Remaining Phosphorus Estimate Through Multiple Regression Analysis
M. E. ALVES; A. LAVORENTI
2006-01-01
The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.
Common pitfalls in statistical analysis: Linear regression analysis.
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Common pitfalls in statistical analysis: Linear regression analysis
Rakesh Aggarwal
2017-01-01
Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Sliced Inverse Regression for Time Series Analysis
Chen, Li-Sue
1995-11-01
In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.
Factors associated with remission and/or regression of microalbuminuria in type 2 diabetes mellitus.
Ono, Tetsuichiro; Shikata, Kenichi; Obika, Mikako; Miyatake, Nobuyuki; Kodera, Ryo; Hirota, Daisyo; Wada, Jun; Kataoka, Hitomi; Ogawa, Daisuke; Makino, Hirofumi
2014-01-01
The aim of this study was to clarify the factors associated with the remission and/or regression of microalbuminuria in Japanese patients with type 2 diabetes mellitus. We retrospectively analyzed the data of 130 patients with type 2 diabetes mellitus with microalbuminuria for 2-6 years (3.39±1.31 years). Remission was defined as improving from microalbuminuria to normoalbuminuria using the albumin/creatinine ratio (ACR), and regression of microalbuminuria was defined as a decrease in ACR of 50% or more from baseline. Progression of microalbuminuria was defined as progressing from microalbuminuria to overt proteinuria during the follow-up period. Among 130 patients with type 2 diabetes mellitus with microalbuminuria, 57 and 13 patients were defined as having remission and regression, respectively, while 26 patients progressed to overt proteinuria. Sex (female), higher HDL cholesterol and lower HbA1c were determinant factors associated with remission/regression of microalbuminuria by logistic regression analysis. Lower systolic blood pressure (SBP) was also correlated with remission/regression, but not at a significant level. These results suggest that proper control of blood glucose, BP and lipid profiles may be associated with remission and/or regression of type 2 diabetes mellitus with microalbuminuria in clinical practice.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Stability Analysis for Regularized Least Squares Regression
Rudin, Cynthia
2005-01-01
We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of...
Study of Mechanical Properties of Wool Type Fabrics using ANCOVA Regression Model
Hristian, L.; Ostafe, M. M.; Manea, L. R.; Apostol, L. L.
2017-06-01
The work has achieved a study on the variation of tensile strength for the four groups of wool fabric type, depending on the fiber composition, the tensile strength of the warp yarns and the weft yarns technological density using ANCOVA regression model. ANCOVA checks the correlation between a dependent variable and the covariate independent variables and removes the variability from the dependent variable that can be accounted for by the covariates. Analysis of covariance models combines analysis of variance with regression analysis techniques. Regarding design, ANCOVA models explain the dependent variable by combining categorical (qualitative) independent variables with continuous (quantitative) variables. There are special extensions to ANCOVA calculations to estimate parameters for both categorical and continuous variables. However ANCOVA models can also be calculated using multiple regression analysis using a design matrix with a mix of dummy-coded qualitative and quantitative variables.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Regression Commonality Analysis: A Technique for Quantitative Theory Building
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Predicting Nigeria budget allocation using regression analysis: A ...
Predicting Nigeria budget allocation using regression analysis: A data mining approach. ... Open Access DOWNLOAD FULL TEXT ... Budget is used by the Government as a guiding tool for planning and management of its resources to aid in ...
Polygraph Test Results Assessment by Regression Analysis Methods
K. A. Leontiev
2014-01-01
Full Text Available The paper considers a problem of defining the importance of asked questions for the examinee under judicial and psychophysiological polygraph examination by methods of mathematical statistics. It offers the classification algorithm based on the logistic regression as an optimum Bayesian classifier, considering weight coefficients of information for the polygraph-recorded physiological parameters with no condition for independence of the measured signs.Actually, binary classification is executed by results of polygraph examination with preliminary normalization and standardization of primary results, with check of a hypothesis that distribution of obtained data is normal, as well as with calculation of coefficients of linear regression between input values and responses by method of maximum likelihood. Further, the logistic curve divided signs into two classes of the "significant" and "insignificant" type.Efficiency of model is estimated by means of the ROC analysis (Receiver Operator Characteristics. It is shown that necessary minimum sample has to contain results of 45 measurements at least. This approach ensures a reliable result provided that an expert-polygraphologist possesses sufficient qualification and follows testing techniques.
An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy
Merlo, Juan; Wagner, Philippe; Ghith, Nermin
2016-01-01
BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
Nielsen, Allan Aasbjerg
2007-01-01
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Projection-type estimation for varying coefficient regression models
Lee, Young K; Park, Byeong U; 10.3150/10-BEJ331
2012-01-01
In this paper we introduce new estimators of the coefficient functions in the varying coefficient regression model. The proposed estimators are obtained by projecting the vector of the full-dimensional kernel-weighted local polynomial estimators of the coefficient functions onto a Hilbert space with a suitable norm. We provide a backfitting algorithm to compute the estimators. We show that the algorithm converges at a geometric rate under weak conditions. We derive the asymptotic distributions of the estimators and show that the estimators have the oracle properties. This is done for the general order of local polynomial fitting and for the estimation of the derivatives of the coefficient functions, as well as the coefficient functions themselves. The estimators turn out to have several theoretical and numerical advantages over the marginal integration estimators studied by Yang, Park, Xue and H\\"{a}rdle [J. Amer. Statist. Assoc. 101 (2006) 1212--1227].
Research and analyze of physical health using multiple regression analysis
T. S. Kyi
2014-01-01
Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Adjusting for Cell Type Composition in DNA Methylation Data Using a Regression-Based Approach.
Jones, Meaghan J; Islam, Sumaiya A; Edgar, Rachel D; Kobor, Michael S
2017-01-01
Analysis of DNA methylation in a population context has the potential to uncover novel gene and environment interactions as well as markers of health and disease. In order to find such associations it is important to control for factors which may mask or alter DNA methylation signatures. Since tissue of origin and coinciding cell type composition are major contributors to DNA methylation patterns, and can easily confound important findings, it is vital to adjust DNA methylation data for such differences across individuals. Here we describe the use of a regression method to adjust for cell type composition in DNA methylation data. We specifically discuss what information is required to adjust for cell type composition and then provide detailed instructions on how to perform cell type adjustment on high dimensional DNA methylation data. This method has been applied mainly to Illumina 450K data, but can also be adapted to pyrosequencing or genome-wide bisulfite sequencing data.
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t
Evaluation Applications of Regression Analysis with Time-Series Data.
Veney, James E.
1993-01-01
The application of time series analysis is described, focusing on the use of regression analysis for analyzing time series in a way that may make it more readily available to an evaluation practice audience. Practical guidelines are suggested for decision makers in government, health, and social welfare agencies. (SLD)
The Analysis of the Regression-Discontinuity Design in R
Thoemmes, Felix; Liao, Wang; Jin, Ze
2017-01-01
This article describes the analysis of regression-discontinuity designs (RDDs) using the R packages rdd, rdrobust, and rddtools. We discuss similarities and differences between these packages and provide directions on how to use them effectively. We use real data from the Carolina Abecedarian Project to show how an analysis of an RDD can be…
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
董洪峰
2013-01-01
[Objective]To explore the risk factors of type 2 diabetes among residents in Tengzhou urban area, provide the basis for developing the intervention strategies and measures. [Methods] 1:1 frequency matched case-control study was conducted in 96 cases with type 2 diabetes in Tengzhou Center Peoples Hospital from May to October 2011. [Results] With the multivariate non-conditional logistic regression analysis, five factors were finally involved into the model, including family history of diabetes, waist-to-hip ratio > 0.9, hyperlipidemia, hypertension and physical exercise, which OR value was 4.31, 3.77, 3. 54, 1.65 and 0.51 respectively. [Conclusion]People with family history of diabetes, waist-to-hip ratio >0.9, hyperlipidemia and hypertension are prone to suffer from type 2 diabetes, and regular physical exercise is the protective factor.%目的 探讨滕州市城区人群2型糖尿病危险因素,为制定干预对策与措施提供依据.方法 2011年5-10月,对滕州市中心人民医院确诊的96例2型糖尿病患者进行1∶1频数匹配的病例对照研究.结果 经多因素非条件logistic回归分析,最终进入模型的5个因素有糖尿病家族史、腰臀比＞0.9、高血脂症、高血压和体育锻炼,其OR值分别为4.31,3.77,3.54,1.65和0.51.结论 具有糖尿病家族史、腰臀比＞0.9、高血脂、高血压的特征人群易患2型糖尿病,经常体育锻炼是降低发病的保护因素.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
高晓虹; 宋桂荣; 辛萍; 马莉; 闻红; 张萍; 高政南; 宋光华
2005-01-01
BACKGROUND: Diabetes mellitus is a chronic metabolic disease caused by various factors,such as environmental factor,inherited factor,etc.,and its cause isn't very clear now. This study aims to investigate the risk factors mentioned above in the onset of diabetes mellitus, and is of significance in the first and second grade of prevention of diabetes mellitus.OBJECTIVE: To investigate the risk factors of type 2 diabetes mellitus to provide evidence for the proper intervention of it.DESIGN: Cross-sectional study based on diagnosis.SETTING: Department of epidemiology in a university and department of endocrinology in a university hospital.PARTICIPANTS: The subjects were residents who have resided in town and country of Dalian for more than 5 years and were elder than 40 years old. The method of stratified cluster random sampling was carried out among natural persons who were divided into two groups according to city and country. Totally 2 500 persons were taken in total and 1 250 persons were from each group. The subjects who had been diagnosed as diabetes mellitus were included.METHODS: Questionnaire survey was employed for all the subjects, including physical examinations such as height,weight,waist girth,hip girth,blood pressure,blood glucose,etc. Patients of type 2 diabetes mellitus were taken as the case group and the normal was as the control group. The unconditional univariate and multivariate logistic regression were used.MAIN OUTCOME MEASURES: Unconditional logistic regression analysis of risk factors of type 2 diabetes mellitus with single factor analysis and multivariate analysis.RESULTS: Family history of diabetes mellitus(OR = 2.339),obesity[body mass index(BMI),OR = 1.462],systolic pressure(OR = 1.016),hyperlipidemia(OR = 1.615), age(OR = 1.043) were the major risk factors for type 2 diabetes mellitus.CONCLUSION: Family history of diabetes mellitus, increase of systolic pressure,obesity,high blood lipid and age are the risk factor for type 2 diabetes
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Newton Carneiro Affonso da Costa Jr.
2004-06-01
Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Regression analysis for solving diagnosis problem of children's health
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Visual category recognition using Spectral Regression and Kernel Discriminant Analysis
Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F.; van de Sande, K.E.A.; Gevers, T.
2009-01-01
Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been s
M. Srinivasan
2012-01-01
Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.
Liu, Xiang; Saat, M Rapik; Qin, Xiao; Barkan, Christopher P L
2013-10-01
Derailments are the most common type of freight-train accidents in the United States. Derailments cause damage to infrastructure and rolling stock, disrupt services, and may cause casualties and harm the environment. Accordingly, derailment analysis and prevention has long been a high priority in the rail industry and government. Despite the low probability of a train derailment, the potential for severe consequences justify the need to better understand the factors influencing train derailment severity. In this paper, a zero-truncated negative binomial (ZTNB) regression model is developed to estimate the conditional mean of train derailment severity. Recognizing that the mean is not the only statistic describing data distribution, a quantile regression (QR) model is also developed to estimate derailment severity at different quantiles. The two regression models together provide a better understanding of train derailment severity distribution. Results of this work can be used to estimate train derailment severity under various operational conditions and by different accident causes. This research is intended to provide insights regarding development of cost-efficient train safety policies. Copyright © 2013 Elsevier Ltd. All rights reserved.
On asymptotics of t-type regression estimation in multiple linear model
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.
Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.
Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments
Ali Erkoc
2014-01-01
Full Text Available In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS. However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Principal regression analysis and the index leverage effect
Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe
2011-09-01
We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call ‘Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
A regressed phase analysis for coupled joint systems.
Wininger, Michael
2011-01-01
This study aims to address shortcomings of the relative phase analysis, a widely used method for assessment of coupling among joints of the lower limb. Goniometric data from 15 individuals with spastic diplegic cerebral palsy were recorded from the hip and knee joints during ambulation on a flat surface, and from a single healthy individual with no known motor impairment, over at least 10 gait cycles. The minimum relative phase (MRP) revealed substantial disparity in the timing and severity of the instance of maximum coupling, depending on which reference frame was selected: MRP(knee-hip) differed from MRP(hip-knee) by 16.1±14% of gait cycle and 50.6±77% difference in scale. Additionally, several relative phase portraits contained discontinuities which may contribute to error in phase feature extraction. These vagaries can be attributed to the predication of relative phase analysis on a transformation into the velocity-position phase plane, and the extraction of phase angle by the discontinuous arc-tangent operator. Here, an alternative phase analysis is proposed, wherein kinematic data is transformed into a profile of joint coupling across the entire gait cycle. By comparing joint velocities directly via a standard linear regression in the velocity-velocity phase plane, this regressed phase analysis provides several key advantages over relative phase analysis including continuity, commutativity between reference frames, and generalizability to many-joint systems.
Forecasting urban water demand: A meta-regression analysis.
Sebri, Maamar
2016-12-01
Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Dana D. Marković
2014-01-01
Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Methods of Detecting Outliers in A Regression Analysis Model ...
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... Capacity), X2 (Design Pressure), X3 (Boiler Type), X4 (Drum Type) were used. The analysis of the ... 1.2 Identification Of Outliers. There is no such thing as a simple test. However, there are many ..... Psychological. Bulletin, 95 ...
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
Residential behavioural energy savings : a meta-regression analysis
Tiedemann, K.H. [BC Hydro, Burnaby, BC (Canada)
2009-07-01
Increasing attention is being given to opportunities for residential energy behavioural savings, as developed countries attempt to reduce energy use and greenhouse gas emissions. Several utility companies have undertaken pilot programs geared at understanding which interventions are most effective in reducing residential energy consumption through behavioural change. This paper presented the first metaregression analysis of residential energy behavioural savings. This study focused on interventions which affected household energy-related behaviours and as a result, affected household energy use. The paper described rational choice theory, the theory of planned behaviour, and the integration of rational choice theory and the adjusted expectancy values theory in a simple framework. The paper also discussed the review of various social, psychological and economics journals and databases. The results of the studies were presented. A basic concept in meta-regression analysis is the effects size which is defined as the program effect divided by the standard error of the program effect. A lengthy review of the literature found twenty-eight treatments from ten experiments for which an effect size could be calculated. The experiments involved classifying treatments according to whether the interventions were information, goal setting, feedback, rewards or combinations of these interventions. The impact of these alternative interventions on the effect size was then modelled using White's robust regression. Five regression models were compared on the basis of the Akaike's information criterion. It was found that model 5, which used all of the regressors, was the preferred model. It was concluded that the theory of planned behaviour is more appropriate in the context of analysis of behavioural change and energy use. 21 refs., 4 tabs.
Meta-regression Analysis of the Chinese Labor Reallocation Effect
Longhua; YUE; Shiyan; YANG; Rongtai; SHEN
2013-01-01
Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect
Multivariate study and regression analysis of gluten-free granola
Lilian Maria Pagamunici
2014-03-01
Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Regression analysis application for designing the vibration dampers
A. V. Ivanov
2014-01-01
Full Text Available Multi-frequency vibration dampers protect air power lines and fiber optic communication channels against Aeolian vibrations. To have a maximum efficiency the natural frequencies of dampers should be evenly distributed over the entire operating frequency range from 3 to 150 Hz. A traditional approach to damper design is to investigate damper features using the fullscale models. As a result, a conclusion on the damper capabilities is drawn, and design changes are made to achieve the required natural frequencies. The article describes a direct optimization method to design dampers.This method leads to a clear-cut definition of geometrical and mass parameters of dampers by their natural frequencies. The direct designing method is based on the active plan and design experiment.Based on regression analysis, a regression model is obtained as a second order polynomial to establish unique relation between the input (element dimensions, the weights of cargos and the output (natural frequencies design parameters. Different problems of designing dampers are considered using developed regression models.As a result, it has been found that a satisfactory accuracy of mathematical models, relating the input designing parameters to the output ones, is achieved. Depending on the number of input parameters and the nature of the restrictions a statement of designing purpose, including an optimization one, can be different when restrictions for design parameters are to meet the conflicting requirements.A proposed optimization method to solve a direct designing problem allows us to determine directly the damper element dimensions for any natural frequencies, and at the initial stage of the analysis, based on the methods of nonlinear programming, to disclose problems with no solution.The developed approach can be successfully applied to design various mechanical systems with complicated nonlinear interactions between the input and output parameters.
Optimum short-time polynomial regression for signal analysis
A SREENIVASA MURTHY; CHANDRA SEKHAR SEELAMANTULA; T V SREENIVAS
2016-11-01
We propose a short-time polynomial regression (STPR) for time-varying signal analysis. The advantage of using polynomials is that the notion of a spectrum is not needed and the signals can be analyzed in the time domain over short durations. In the presence of noise, such modeling becomes important, because the polynomial approximation performs smoothing leading to noise suppression. The problem of optimal smoothingdepends on the duration over which a fixed-order polynomial regression is performed. Considering the STPR of a noisy signal, we derive the optimal smoothing window by minimizing the mean-square error (MSE). For a fixed polynomial order, the smoothing window duration depends on the rate of signal variation, which, in turn,depends on its derivatives. Since the derivatives are not available a priori, exact optimization is not feasible.However, approximate optimization can be achieved using only the variance expressions and the intersection-ofconfidence-intervals (ICI) technique. The ICI technique is based on a consistency measure across confidence intervals corresponding to different window lengths. An approximate asymptotic analysis to determine the optimal confidence interval width shows that the asymptotic expressions are the same irrespective of whether one starts with a uniform sampling grid or a nonuniform one. Simulation results on sinusoids, chirps, and electrocardiogram (ECG) signals, and comparisons with standard wavelet denoising techniques, show that theproposed method is robust particularly in the low signal-to-noise ratio regime.
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
A Visual Analytics Approach for Correlation, Classification, and Regression Analysis
Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)
2012-02-01
New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
Cardiorespiratory fitness and laboratory stress: a meta-regression analysis.
Jackson, Erica M; Dishman, Rod K
2006-01-01
We performed a meta-regression analysis of 73 studies that examined whether cardiorespiratory fitness mitigates cardiovascular responses during and after acute laboratory stress in humans. The cumulative evidence indicates that fitness is related to slightly greater reactivity, but better recovery. However, effects varied according to several study features and were smallest in the better controlled studies. Fitness did not mitigate integrated stress responses such as heart rate and blood pressure, which were the focus of most of the studies we reviewed. Nonetheless, potentially important areas, particularly hemodynamic and vascular responses, have been understudied. Women, racial/ethnic groups, and cardiovascular patients were underrepresented. Randomized controlled trials, including naturalistic studies of real-life responses, are needed to clarify whether a change in fitness alters putative stress mechanisms linked with cardiovascular health.
Multivariate Regression Analysis of Gravitational Waves from Rotating Core Collapse
Engels, William J; Ott, Christian D
2014-01-01
We present a new multivariate regression model for analysis and parameter estimation of gravitational waves observed from well but not perfectly modeled sources such as core-collapse supernovae. Our approach is based on a principal component decomposition of simulated waveform catalogs. Instead of reconstructing waveforms by direct linear combination of physically meaningless principal components, we solve via least squares for the relationship that encodes the connection between chosen physical parameters and the principal component basis. Although our approach is linear, the waveforms' parameter dependence may be non-linear. For the case of gravitational waves from rotating core collapse, we show, using statistical hypothesis testing, that our method is capable of identifying the most important physical parameters that govern waveform morphology in the presence of simulated detector noise. We also demonstrate our method's ability to predict waveforms from a principal component basis given a set of physical ...
Spatial regression analysis of traffic crashes in Seoul.
Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-ihn; Ulfarsson, Gudmundur F
2016-06-01
Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30 km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30 km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands.
刘明哲
2015-01-01
Objective To explore the main risk factors of type 2 diabetes mellitus(T2DM)complicated with cardiovascular disease(CVD).Methods The T2DMof 128 cases of CVD associated with CVD group were selected, the patients with T2DM 107 cases were selected as control group,used Logistic regression method for the analysis of the risk factors of concurrent CVD.Results The risk of CVD in patients with a family history of T2DM was 1.535 times of that of the other patients (OR =1.535,95%CI =1.145,2.057,P =0.036),the vegetarian diet patients was 41.3% (OR =0.413,95%CI =0.210,0.815,P =0.024),in patients with hypertension was 2.077 times (OR =2.077,95%CI =1.301,2.813,P =0.010).T2DM patients with TG,PBG,LDL -C,HDL -C per 1mmol/L rise,the risk of concurrent CVD was 1.192 times of that of the other patients (OR =1.192,95%CI 1.012,1.372, P =0.023),1.125 times(OR =1.125,95%CI =1.043,1.218,P =0.028),1.712 times (OR =1.712,95%CI =1.203,2.231,P =0.009)and 42.6% (OR =0.426,95%CI =0.239,0.776,P =0.011);HbA1c increased every 1%,the risk of concurrent CVD was 1.284 times of that of theother patients (OR =1.284,95%CI =1.132,1.413, P =0.013);BMI increased by 1kg/m2 ,the risk of concurrent CVD was 1.508 times of that of the other patients (OR =1.508,95%CI =1.143,1.825,P =0.026);C2 increased by 1mL/mmHg ×100,the risk was the other patient's 33.9% (OR =1.508,95%CI =1.143,1.825,P =0.026).Conclusion Family history of T2DM,hypertension, TG,PBG,LDL -C,HbA1c and BMI are major risk factors for T2DMwith CVD;vegetarian diet,HDL -C and C2 are protective factors.%目的：探讨2型糖尿病（T2DM）并发心血管疾病（CVD）的主要危险因素。方法选择 T2DM合并 CVD 患者128例为 CVD 组，单纯 T2DM患者107例为对照组，采用 Logistic 回归方法对其并发 CVD 的危险因素进行分析。结果有 T2DM家族史、素食膳食或高血压患者并发 CVD 的危险为其他患者的1．535倍（OR ＝1．535，95％CI ＝1．145，2．057，P ＝0．036）、41．3％（OR ＝0
Simultaneous estimation and variable selection in median regression using Lasso-type penalty.
Xu, Jinfeng; Ying, Zhiliang
2010-06-01
We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.
Assessing the effects of different types of covariates for binary logistic regression
Hamid, Hamzah Abdul; Wah, Yap Bee; Xie, Xian-Jin; Rahman, Hezlin Aryani Abd
2015-02-01
It is well known that the type of data distribution in the independent variable(s) may affect many statistical procedures. This paper investigates and illustrates the effect of different types of covariates on the parameter estimation of a binary logistic regression model. A simulation study with different sample sizes and different types of covariates (uniform, normal, skewed) was carried out. Results showed that parameter estimation of binary logistic regression model is severely overestimated when sample size is less than 150 for covariate which have normal and uniform distribution while the parameter is underestimated when the distribution of covariate is skewed. Parameter estimation improves for all types of covariates when sample size is large, that is at least 500.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Kauhl, Boris; Schweikart, Jürgen; Krafft, Thomas; Keste, Andrea; Moskwyn, Marita
2016-11-03
The provision of general practitioners (GPs) in Germany still relies mainly on the ratio of inhabitants to GPs at relatively large scales and barely accounts for an increased prevalence of chronic diseases among the elderly and socially underprivileged populations. Type 2 Diabetes Mellitus (T2DM) is one of the major cost-intensive diseases with high rates of potentially preventable complications. Provision of healthcare and access to preventive measures is necessary to reduce the burden of T2DM. However, current studies on the spatial variation of T2DM in Germany are mostly based on survey data, which do not only underestimate the true prevalence of T2DM, but are also only available on large spatial scales. The aim of this study is therefore to analyse the spatial distribution of T2DM at fine geographic scales and to assess location-specific risk factors based on data of the AOK health insurance. To display the spatial heterogeneity of T2DM, a bivariate, adaptive kernel density estimation (KDE) was applied. The spatial scan statistic (SaTScan) was used to detect areas of high risk. Global and local spatial regression models were then constructed to analyze socio-demographic risk factors of T2DM. T2DM is especially concentrated in rural areas surrounding Berlin. The risk factors for T2DM consist of proportions of 65-79 year olds, 80 + year olds, unemployment rate among the 55-65 year olds, proportion of employees covered by mandatory social security insurance, mean income tax, and proportion of non-married couples. However, the strength of the association between T2DM and the examined socio-demographic variables displayed strong regional variations. The prevalence of T2DM varies at the very local level. Analyzing point data on T2DM of northeastern Germany's largest health insurance provider thus allows very detailed, location-specific knowledge about increased medical needs. Risk factors associated with T2DM depend largely on the place of residence of the
Giuseppe Palermo
2009-05-01
Full Text Available Giuseppe Palermo1, Paolo Piraino2, Hans-Dieter Zucht31Digilab BioVision GmbH, Hannover, Germany; 2Dr Paolo Piraino Statistical Consulting, Rende (CS, Italy; 3Proteome Sciences R&D GmbH and C. KG, Frankfurt am Main, GermanyAbstract: Multivariate partial least square (PLS regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics. In presence of multiple responses, it is of particular interest how to appropriately “dissect” the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection. In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coeffi cients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.Keywords: partial least square regression, regression coefficients, variable selection, biomarker discovery, omics-data
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Wen-Tsao Pan
2016-01-01
Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
2004-01-01
censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis......censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis...
Regression analysis of restricted mean survival time based on pseudo-observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations......censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations...
Regression and kriging analysis for grid power factor estimation
Rajesh Guntaka
2014-12-01
Full Text Available The measurement of power factor (PF in electrical utility grids is a mainstay of load balancing and is also a critical element of transmission and distribution efficiency. The measurement of PF dates back to the earliest periods of electrical power distribution to public grids. In the wide-area distribution grid, measurement of current waveforms is trivial and may be accomplished at any point in the grid using a current tap transformer. However, voltage measurement requires reference to ground and so is more problematic and measurements are normally constrained to points that have ready and easy access to a ground source. We present two mathematical analysis methods based on kriging and linear least square estimation (LLSE (regression to derive PF at nodes with unknown voltages that are within a perimeter of sample nodes with ground reference across a selected power grid. Our results indicate an error average of 1.884% that is within acceptable tolerances for PF measurements that are used in load balancing tasks.
A simplified procedure of linear regression in a preliminary analysis
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Fast nonlinear regression method for CT brain perfusion analysis.
Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M
2016-04-01
Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.
Influencing Academic Library Use in Tanzania: A Multiple Regression Analysis
Leocardia L Juventus
2016-12-01
Full Text Available Library use is influenced by many factors. This study uses a multiple regression analysis to ascertain the connection between the level of library use and a few of these factors based on the questionnaire responses from 158 undergraduate students who use academic libraries in two Tanzania’s universities: Muhimbili University of Health and Allied Sciences (MUHAS, and Hubert Kairuki Memorial University (HKMU. It has been discovered that users of academic libraries in Tanzania are influenced by the need to: search and access online materials, check for new books or other resources, check out books and other materials, and enjoy a friendly environment for study. However, their library use is not influenced by either the free wireless network, or consultation from librarians. It is argued that, academic libraries need to devise and implement plans that can make these libraries better learning environment and platforms to drive socio-economic developmentparticularly in developing nations such as Tanzania. It is further argued that, this can be enhanced through investment in modern academic library infrastructures.
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Zhiming Song
2015-01-01
Full Text Available As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m-1-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m-1-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
A novel multiobjective evolutionary algorithm based on regression analysis.
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m - 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m - 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis
Kim, Rae Seon
2011-01-01
When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Design and analysis of experiments classical and regression approaches with SAS
Onyiah, Leonard C
2008-01-01
Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...... and the prediction of the response at time t only depends on th concurrently observed predictor. We introduce a version of this model for multilevel functional data of the type subjectunit, with the unit-level data being functional observations. Finally, in the fourth paper we show how registration can be applied...
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Regression analysis of censored data using pseudo-observations
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been...
Measuring Habituation in Infants: An Approach Using Regression Analysis.
Ashmead, Daniel H.; Davis, DeFord L.
1996-01-01
Used computer simulations to examine effectiveness of different criteria for measuring infant visual habituation. Found that a criterion based on fitting a second-order polynomial regression function to looking-time data produced more accurate estimation of looking times and higher power for detecting novelty effects than did the traditional…
Grades, Gender, and Encouragement: A Regression Discontinuity Analysis
Owen, Ann L.
2010-01-01
The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…
Trend Analysis of Cancer Mortality and Incidence in Panama, Using Joinpoint Regression Analysis
Politis, Michael; Higuera, Gladys; Chang, Lissette Raquel; Gomez, Beatriz; Bares, Juan; Motta, Jorge
2015-01-01
Abstract Cancer is one of the leading causes of death worldwide and its incidence is expected to increase in the future. In Panama, cancer is also one of the leading causes of death. In 1964, a nationwide cancer registry was started and it was restructured and improved in 2012. The aim of this study is to utilize Joinpoint regression analysis to study the trends of the incidence and mortality of cancer in Panama in the last decade. Cancer mortality was estimated from the Panamanian National Institute of Census and Statistics Registry for the period 2001 to 2011. Cancer incidence was estimated from the Panamanian National Cancer Registry for the period 2000 to 2009. The Joinpoint Regression Analysis program, version 4.0.4, was used to calculate trends by age-adjusted incidence and mortality rates for selected cancers. Overall, the trend of age-adjusted cancer mortality in Panama has declined over the last 10 years (−1.12% per year). The cancers for which there was a significant increase in the trend of mortality were female breast cancer and ovarian cancer; while the highest increases in incidence were shown for breast cancer, liver cancer, and prostate cancer. Significant decrease in the trend of mortality was evidenced for the following: prostate cancer, lung and bronchus cancer, and cervical cancer; with respect to incidence, only oral and pharynx cancer in both sexes had a significant decrease. Some cancers showed no significant trends in incidence or mortality. This study reveals contrasting trends in cancer incidence and mortality in Panama in the last decade. Although Panama is considered an upper middle income nation, this study demonstrates that some cancer mortality trends, like the ones seen in cervical and lung cancer, behave similarly to the ones seen in high income countries. In contrast, other types, like breast cancer, follow a pattern seen in countries undergoing a transition to a developed economy with its associated lifestyle, nutrition, and
Regression of gadolinium-enhanced lesions in patients affected by neurofibromatosis type 1.
Lucchetta, Marta; Manara, Renzo; Perilongo, Giorgio; Clementi, Maurizio; Trevisson, Eva
2016-03-01
Neurofibromatosis type I is a genetic condition with an autosomal dominant transmission characterized by neurocutaneous involvement and a predisposition to tumor development. Central nervous system manifestations include benign areas of dysmyelination and possibly hazardous glial tumors whose clinical management may result challenging. Here, we report on three patients diagnosed with Neurofibromatosis type I whose brain MRI follow-up showed the presence of gadolinium-enhancing lesions which spontaneously regressed. In none of the three cases, the lesions showed any clinical correlate and eventually presented a striking reduction in size while gadolinium enhancement disappeared despite no specific therapy administration during the follow-up. Although their nature remains undetermined, these lesions presented a benign evolution. However, they might be misdiagnosed as potentially life-threatening tumors. Hitherto, a similar behavior has been described only in scattered cases and we believe these findings may be of particular interest for the clinical management of patients affected by neurofibromatosis type I.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Academic Achievement, Intelligence, and Creativity: A Regression Surface Analysis.
Marjoribanks, K
1976-01-01
Data collected on 400 12-year-old English school children were used to examine relations between measures of intelligence, creativity and academic achievement. Complex multiple regression models, which included terms to account for the possible interaction and curvilinear relations between intelligence, creativity and academic achievement were used to construct regression surfaces. The surfaces showed that the traditional threshold hypothesis, which suggests that beyond a certain level of intelligence academic achievement is related increasingly to creativity and ceases to be related strongly to intelligence, was not supported. For some areas of academic performance the results suggest an alternate proposition, that creativity ceases to be related to achievement after a threshold level of intelligence has been reached. It was also found that at high levels of verbal ability, non-verbal ability and creativity appeared to have differential relations with academic achievement.
An, Lihua; Fung, Karen Y; Krewski, Daniel
2010-09-01
Spontaneous adverse event reporting systems are widely used to identify adverse reactions to drugs following their introduction into the marketplace. In this article, a James-Stein type shrinkage estimation strategy was developed in a Bayesian logistic regression model to analyze pharmacovigilance data. This method is effective in detecting signals as it combines information and borrows strength across medically related adverse events. Computer simulation demonstrated that the shrinkage estimator is uniformly better than the maximum likelihood estimator in terms of mean squared error. This method was used to investigate the possible association of a series of diabetic drugs and the risk of cardiovascular events using data from the Canada Vigilance Online Database.
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Hiroyuki Nakamoto
2014-01-01
Full Text Available The human is covered with soft skin and has tactile receptors inside. The skin deforms along a contact surface. The tactile receptors detect the mechanical deformation. The detection of the mechanical deformation is essential for the tactile sensation. We propose a magnetic type tactile sensor which has a soft surface and eight magnetoresistive elements. The soft surface has a permanent magnet inside and the magnetoresistive elements under the soft surface measure the magnetic flux density of the magnet. The tactile sensor estimates the displacement and the rotation on the surface based on the change of the magnetic flux density. Determination of an estimate equation is difficult because the displacement and the rotation are not geometrically decided based on the magnetic flux density. In this paper, a stepwise regression analysis determines the estimate equation. The outputs of the magnetoresistive elements are used as explanatory variables, and the three-axis displacement and the two-axis rotation are response variables in the regression analysis. We confirm the regression analysis is effective for determining the estimate equations through simulation and experiment. The results show the tactile sensor measures both the displacement and the rotation generated on the surface by using the determined equation.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.
2015-03-01
Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.
Lingling; TAN
2013-01-01
This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.
Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences
2007-09-15
This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Parsons, Vickie s.
2009-01-01
The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression.
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-20
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73-0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer.
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obta
Casey P Durand
Full Text Available INTRODUCTION: Statistical interactions are a common component of data analysis across a broad range of scientific disciplines. However, the statistical power to detect interactions is often undesirably low. One solution is to elevate the Type 1 error rate so that important interactions are not missed in a low power situation. To date, no study has quantified the effects of this practice on power in a linear regression model. METHODS: A Monte Carlo simulation study was performed. A continuous dependent variable was specified, along with three types of interactions: continuous variable by continuous variable; continuous by dichotomous; and dichotomous by dichotomous. For each of the three scenarios, the interaction effect sizes, sample sizes, and Type 1 error rate were varied, resulting in a total of 240 unique simulations. RESULTS: In general, power to detect the interaction effect was either so low or so high at α = 0.05 that raising the Type 1 error rate only served to increase the probability of including a spurious interaction in the model. A small number of scenarios were identified in which an elevated Type 1 error rate may be justified. CONCLUSIONS: Routinely elevating Type 1 error rate when testing interaction effects is not an advisable practice. Researchers are best served by positing interaction effects a priori and accounting for them when conducting sample size calculations.
STANDARDIZING TYPE Ia SUPERNOVA ABSOLUTE MAGNITUDES USING GAUSSIAN PROCESS DATA REGRESSION
Kim, A. G.; Aldering, G.; Aragon, C.; Bailey, S.; Childress, M.; Fakhouri, H. K.; Nordin, J. [Physics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 (United States); Thomas, R. C. [Computational Cosmology Center, Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road MS 50B-4206, Berkeley, CA 94720 (United States); Antilogus, P.; Bongard, S.; Canto, A.; Cellier-Holzem, F.; Guy, J. [Laboratoire de Physique Nucleaire et des Hautes Energies, Universite Pierre et Marie Curie Paris 6, Universite Denis Diderot Paris 7, CNRS-IN2P3, 4 place Jussieu, F-75252 Paris Cedex 05 (France); Baltay, C. [Department of Physics, Yale University, New Haven, CT 06250-8121 (United States); Buton, C.; Kerschhaggl, M.; Kowalski, M. [Physikalisches Institut, Universitaet Bonn, Nussallee 12, D-53115 Bonn (Germany); Chotard, N. [Tsinghua Center for Astrophysics, Tsinghua University, Beijing 100084 (China); Copin, Y.; Gangler, E. [Universite de Lyon, F-69622 Lyon (France); and others
2013-04-01
We present a novel class of models for Type Ia supernova time-evolving spectral energy distributions (SEDs) and absolute magnitudes: they are each modeled as stochastic functions described by Gaussian processes. The values of the SED and absolute magnitudes are defined through well-defined regression prescriptions, so that data directly inform the models. As a proof of concept, we implement a model for synthetic photometry built from the spectrophotometric time series from the Nearby Supernova Factory. Absolute magnitudes at peak B brightness are calibrated to 0.13 mag in the g band and to as low as 0.09 mag in the z = 0.25 blueshifted i band, where the dispersion includes contributions from measurement uncertainties and peculiar velocities. The methodology can be applied to spectrophotometric time series of supernovae that span a range of redshifts to simultaneously standardize supernovae together with fitting cosmological parameters.
Standardizing Type Ia Supernova Absolute Magnitudes Using Gaussian Process Data Regression
Kim, A G; Aldering, G; Antilogus, P; Aragon, C; Bailey, S; Baltay, C; Bongard, S; Buton, C; Canto, A; Cellier-Holzem, F; Childress, M; Chotard, N; Copin, Y; Fakhouri, H K; Gangler, E; Guy, J; Kerschhaggl, M; Kowalski, M; Nordin, J; Nugent, P; Paech, K; Pain, R; Pécontal, E; Pereira, R; Perlmutter, S; Rabinowitz, D; Rigault, M; Runge, K; Saunders, C; Scalzo, R; Smadja, G; Tao, C; Weaver, B A; Wu, C
2013-01-01
We present a novel class of models for Type Ia supernova time-evolving spectral energy distributions (SED) and absolute magnitudes: they are each modeled as stochastic functions described by Gaussian processes. The values of the SED and absolute magnitudes are defined through well-defined regression prescriptions, so that data directly inform the models. As a proof of concept, we implement a model for synthetic photometry built from the spectrophotometric time series from the Nearby Supernova Factory. Absolute magnitudes at peak $B$ brightness are calibrated to 0.13 mag in the $g$-band and to as low as 0.09 mag in the $z=0.25$ blueshifted $i$-band, where the dispersion includes contributions from measurement uncertainties and peculiar velocities. The methodology can be applied to spectrophotometric time series of supernovae that span a range of redshifts to simultaneously standardize supernovae together with fitting cosmological parameters.
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-02-16
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
Air Pollution Analysis using Ontologies and Regression Models
Parul Choudhary
2016-07-01
Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.
Survival analysis of cervical cancer using stratified Cox regression
Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.
2016-04-01
Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
Introduction to mixed modelling beyond regression and analysis of variance
Galwey, N W
2007-01-01
Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
A regression analysis on the green olives debittering
Kopsidas, Gerassimos C.
1991-12-01
Full Text Available In this paper, a regression model, which gives the debittering time t as a function of the sodium hydroxide concentration 0 and the debittering temperature T, at the debittering of medium size green olive fruit of the Conservolea variety, is fitted. This model has the simple form t=a_{o}C^{a1} ∙ e^{a2/T}, where a_{o}, a_{1}, and a_{2} are constants. The values of a_{o}, a_{1}, and a_{2} are determined by the method of least squares from a set of experimental data. The determined model is very satisfactory for the conditions in which Greek green olives are debittered.
En este artículo se ajusta un modelo de regresión, que da el tiempo de endulzamiento t en función de la concentración de hidróxido sódico C y la temperatura de endulzamiento T, en el endulzamiento de aceitunas verdes de tamaño mediano de la variedad Conservolea. Este modelo tiene la forma simple t=a_{o}C^{a1} ∙ e^{a2/T}, donde a_{1} y a_{2} son constantes. Los valores de a_{o}, a_{1}, y a_{2} son determinados por el método de los mínimos cuadrados a partir de un grupo de datos experimentales. El modelo determinado es muy satisfactorio para las condiciones en las que las aceitunas verdes griegas son endulzadas.
A Quality Assessment Tool for Non-Specialist Users of Regression Analysis
Argyrous, George
2015-01-01
This paper illustrates the use of a quality assessment tool for regression analysis. It is designed for non-specialist "consumers" of evidence, such as policy makers. The tool provides a series of questions such consumers of evidence can ask to interrogate regression analysis, and is illustrated with reference to a recent study published…
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic...... and small ground motion. The Random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time Domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random Decrement technique are compared with results from an analysis based on fast Fourier transformations....
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
1996-01-01
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data fro the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e. g. wind, traffic...... and small ground motion. The random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random decrement technique are compared with results from an analysis based on fast Fourier transformations....
Analysis of cost regression and post-accident absence
Wojciech, Drozd
2017-07-01
The article presents issues related with costs of work safety. It proves the thesis that economic aspects cannot be overlooked in effective management of occupational health and safety and that adequate expenditures on safety can bring tangible benefits to the company. Reliable analysis of this problem is essential for the description the problem of safety the work. In the article attempts to carry it out using the procedures of mathematical statistics [1, 2, 3].
唐景华; 李骏; 梁金玲; 刘红
2013-01-01
Objective:To explore related factors of plasma fibrinogen (FIB) levels in the newly diagnosed type 2 diabetic patients.Methods:To select clinical data of hospitalized patients of 67 cases that newly diagnosed type 2 diabetes in our hospital from January 2011 to August 2012,and using t-tests and multiple linear regression analysis (step-by-step method),retrospectively analyze relationship between plasma FIB and patient’s age,disease duration,blood pressure,fasting blood glucose (FBG),triglyceride(TG),total cholesterol(TC),Low-density(LDL-CH),high-density lipoprotein(HDL-CH),fasting C-peptide(FCP), glycosylated hemoglobin(HbA1c) and glycosylated serum protein(GSP),plasma homocysteine(Hcy),serum uric acid(UA),C-reactive protein (CRP). Results:In addition to the argument UA into the regression analysis model(F=7.904,P=0.013),the regression model was statistically significant,the other independent variables did not enter the regression model,UA and FIB were linear dependencies[regression system B=1.996,t=3.212,P=0.006,95%CI(0.672, 3.321)],and a positive correlation (R=0.658,R2=0.432).Conclusion:Serum UA levels is the predictors of plasma FIB levels in newly diagnosed type 2 diabetic patients,controling blood UA levels can help to reduce the level of plasma FIB.% 目的：探讨初诊2型糖尿病(T2DM)患者血浆纤维蛋白原(FIB)水平相关影响因素.方法：收集67例2011年1月-2012年8月初诊的T2DM 住院患者的临床资料,采用组间 t 检验和多重线性回归分析(Stepwise 法,逐步法),回顾性分析患者血浆 FIB 与年龄、病程、血压、空腹静脉血糖(FBG)、甘油三酯(TG)、总胆固醇(TC)、低密度脂蛋白(LDL-CH)、高密度脂蛋白(HDL-CH)、空腹 C 肽(FCP)、糖化血红蛋白(HbA1c)、糖化血清蛋白(GSP)、血浆同型半胱氨酸(Hcy)、血尿酸(UA)、C 反应蛋白(CRP)等之间的关系.结果：自变量 UA 进入回归分析模型(F=7.904,P=0.013),该回归模型有统计学意义,即血 UA
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
Measurement and Analysis of Test Suite Volume Metrics for Regression Testing
S Raju
2014-01-01
Full Text Available Regression testing intends to ensure that a software applications works as specified after changes made to it during maintenance. It is an important phase in software development lifecycle. Regression testing is the re-execution of some subset of test cases that has already been executed. It is an expensive process used to detect defects due to regressions. Regression testing has been used to support software-testing activities and assure acquiring an appropriate quality through several versions of a software product during its development and maintenance. Regression testing assures the quality of modified applications. In this proposed work, a study and analysis of metrics related to test suite volume was undertaken. It was shown that the software under test needs more test cases after changes were made to it. A comparative analysis was performed for finding the change in test suite size before and after the regression test.
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
Analysis of sparse data in logistic regression in medical research: A newer approach
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Automated particle identification through regression analysis of size, shape and colour
Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.
2016-04-01
Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
GOAL PROGRAMMING ALGORITHM FOR A TYPE OF LEAST ABSOLUTE VALUE REGRESSION PROBLEM
SHI Kuiran; XIAO Tiaojun; ZHANG Weirong
2004-01-01
This paper develops goal programming algorithm to solve a type of least absolute value (LAV) problem. Firstly, we simplify the simplex algorithm by proving the existence of solutions of the problem. Then, we present a goal programming algorithm on the basis of the original techniques. Theoretical analysis and numerical results indicate that the new method contains a lower number of deviation variables and consumes less computational time as compared to current LAV methods.
Mandel, Kaisey S; Kirshner, Robert P
2014-01-01
We investigate the correlations between the peak intrinsic colors of Type Ia supernovae (SN Ia) and their expansion velocities at maximum light, measured from the Si II 6355 A spectral feature. We construct a new hierarchical Bayesian regression model and Gibbs sampler to estimate the dependence of the intrinsic colors of a SN Ia on its ejecta velocity, while accounting for the random effects of intrinsic scatter, measurement error, and reddening by host galaxy dust. The method is applied to the apparent color data from BVRI light curves and Si II velocity data for 79 nearby SN Ia. Comparison of the apparent color distributions of high velocity (HV) and normal velocity (NV) supernovae reveals significant discrepancies in B-V and B-R, but not other colors. Hence, they are likely due to intrinsic color differences originating in the B-band, rather than dust reddening. The mean intrinsic B-V and B-R color differences between HV and NV groups are 0.06 +/- 0.02 and 0.09 +/- 0.02 mag, respectively. Under a linear m...
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
Verdoolaege, G.; Shabbir, A.; Hornung, G.
2016-11-01
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.
Development of a User Interface for a Regression Analysis Software Tool
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.
A factor analysis-multiple regression model for source apportionment of suspended particulate matter
Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige
A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.
Chen, Hui-Fang; Jin, Kuan-Yu; Wang, Wen-Chung
2017-01-01
Extreme response styles (ERS) is prevalent in Likert- or rating-type data but previous research has not well-addressed their impact on differential item functioning (DIF) assessments. This study aimed to fill in the knowledge gap and examined their influence on the performances of logistic regression (LR) approaches in DIF detections, including the ordinal logistic regression (OLR) and the logistic discriminant functional analysis (LDFA). Results indicated that both the standard OLR and LDFA yielded severely inflated false positive rates as the magnitude of the differences in ERS increased between two groups. This study proposed a class of modified LR approaches to eliminating the ERS effect on DIF assessment. These proposed modifications showed satisfactory control of false positive rates when no DIF items existed and yielded a better control of false positive rates and more accurate true positive rates under DIF conditions than the conventional LR approaches did. In conclusion, the proposed modifications are recommended in survey research when there are multiple group or cultural groups.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Telmo, C; Lousada, J; Moreira, N
2010-06-01
The gross calorific value (GCV), proximate, ultimate and chemical analysis of debark wood in Portugal were studied, for future utilization in wood pellets industry and the results compared with CEN/TS 14961. The relationship between GCV, ultimate and chemical analysis were determined by multiple regression stepwise backward. The treatment between hardwoods-softwoods did not result in significant statistical differences for proximate, ultimate and chemical analysis. Significant statistical differences were found in carbon for National (hardwoods-softwoods) and (National-tropical) hardwoods in volatile matter, fixed carbon, carbon and oxygen and also for chemical analysis in National (hardwoods-softwoods) for F and (National-tropical) hardwoods for Br. GCV was highly positively related to C (0.79 * * *) and negatively to O (-0.71 * * *). The final independent variables of the model were (C, O, S, Zn, Ni, Br) with R(2)=0.86; F=27.68 * * *. The hydrogen did not contribute statistically to the energy content.
Mandel, Kaisey S.; Kirshner, Robert P. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Foley, Ryan J., E-mail: kmandel@cfa.harvard.edu [Astronomy Department, University of Illinois at Urbana-Champaign, 1002 West Green Street, Urbana, IL 61801 (United States)
2014-12-20
We investigate the statistical dependence of the peak intrinsic colors of Type Ia supernovae (SNe Ia) on their expansion velocities at maximum light, measured from the Si II λ6355 spectral feature. We construct a new hierarchical Bayesian regression model, accounting for the random effects of intrinsic scatter, measurement error, and reddening by host galaxy dust, and implement a Gibbs sampler and deviance information criteria to estimate the correlation. The method is applied to the apparent colors from BVRI light curves and Si II velocity data for 79 nearby SNe Ia. The apparent color distributions of high-velocity (HV) and normal velocity (NV) supernovae exhibit significant discrepancies for B – V and B – R, but not other colors. Hence, they are likely due to intrinsic color differences originating in the B band, rather than dust reddening. The mean intrinsic B – V and B – R color differences between HV and NV groups are 0.06 ± 0.02 and 0.09 ± 0.02 mag, respectively. A linear model finds significant slopes of –0.021 ± 0.006 and –0.030 ± 0.009 mag (10{sup 3} km s{sup –1}){sup –1} for intrinsic B – V and B – R colors versus velocity, respectively. Because the ejecta velocity distribution is skewed toward high velocities, these effects imply non-Gaussian intrinsic color distributions with skewness up to +0.3. Accounting for the intrinsic-color-velocity correlation results in corrections to A{sub V} extinction estimates as large as –0.12 mag for HV SNe Ia and +0.06 mag for NV events. Velocity measurements from SN Ia spectra have the potential to diminish systematic errors from the confounding of intrinsic colors and dust reddening affecting supernova distances.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi
2012-12-20
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions
Junzan Zhou
2015-01-01
Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.
Family system dynamics and type 1 diabetic glycemic variability: a vector-auto-regressive model.
Günther, Moritz Philipp; Winker, Peter; Böttcher, Claudia; Brosig, Burkhard
2013-06-01
Statistical approaches rooted in econometric methodology, so far foreign to the psychiatric and psychological realms have provided exciting and substantial new insights into complex mind-body interactions over time and individuals. Over 120 days, this structured diary study explored the mutual interactions of emotions within a classic 3-person family system with its Type 1 diabetic adolescent's daily blood glucose variability. Glycemic variability was measured through daily standard deviations of blood glucose determinations (at least 3 per day). Emotions were captured individually utilizing the self-assessment manikin on affective valence (negative-positive), activation (calm-excited), and control (dominated-dominant). Auto- and cross-correlating the stationary absolute (level) values of the mutually interacting parallel time series data sets through vector autoregression (VAR, grounded in econometric theory) allowed for the formulation of 2 concordant models. Applying Cholesky Impulse Response Analysis at a 95% confidence interval, we provided evidence for an adolescent being happy, calm, and in control to exhibit less glycemic variability and hence diabetic derailment. A nondominating mother and a happy father seemed to also reduce glycemic variability. Random shocks increasing glycemic variability affected only the adolescent and her father: In 1 model, the male parent felt in charge; in the other, he calmed down while his daughter turned sad. All reactions to external shocks lasted for less than 4 full days. Extant literature on affect and glycemic variability in Type 1 diabetic adolescents as well as challenges arising from introducing econometric theory to the field were discussed.
Mohammad Ali HORMOZI
2015-06-01
Full Text Available We analyzed the effect of chemical fertilizer, seed, biocide, farm machinery and labor hours on production of paddy (paddy rice in the Khuzestan province in the South Western part of Iran. Here we test two methods (linear regression and neural network. We conclude that the results gotten by neural network with no hidden layer and linear regression are closed to each other. We insist that for a data set of this type the regression analysis yields more reliable results compared to a neural network. They suggest that machinery has a very clear positive effect on yield while fertilizer and labor doesn't affect on it. One can say that there is no necessity that increasing the amount of some "useful input" increase paddy production.
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
Cronk, Ryan; Bartram, Jamie
2017-09-21
Sufficient, safe, and continuously available water services are important for human development and health yet many water systems in low- and middle-income countries are nonfunctional. Monitoring data were analyzed using regression and Bayesian networks (BNs) to explore factors influencing the functionality of 82 503 water systems in Nigeria and Tanzania. Functionality varied by system type. In Tanzania, Nira handpumps were more functional than Afridev and India Mark II handpumps. Higher functionality was associated with fee collection in Nigeria. In Tanzania, functionality was higher if fees were collected monthly rather than in response to system breakdown. Systems in Nigeria were more likely to be functional if they were used for both human and livestock consumption. In Tanzania, systems managed by private operators were more functional than community-managed systems. The BNs found strong dependencies between functionality and system type and administrative unit (e.g., district). The BNs predicted functionality increased from 68% to 89% in Nigeria and from 53% to 68% in Tanzania when best observed conditions were in place. Improvements to water system monitoring and analysis of monitoring data with different modeling techniques may be useful for identifying water service improvement opportunities and informing evidence-based decision-making for better management, policy, programming, and practice.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
The efficiency of modified jackknife and ridge type regression estimators: a comparison
Sharad Damodar Gore
2008-09-01
Full Text Available A common problem in multiple regression models is multicollinearity, which produces undesirable effects on the least squares estimator. To circumvent this problem, two well known estimation procedures are often suggested in the literature. They are Generalized Ridge Regression (GRR estimation suggested by Hoerl and Kennard iteb8 and the Jackknifed Ridge Regression (JRR estimation suggested by Singh et al. iteb13. The GRR estimation leads to a reduction in the sampling variance, whereas, JRR leads to a reduction in the bias. In this paper, we propose a new estimator namely, Modified Jackknife Ridge Regression Estimator (MJR. It is based on the criterion that combines the ideas underlying both the GRR and JRR estimators. We have investigated standard properties of this new estimator. From a simulation study, we find that the new estimator often outperforms the LASSO, and it is superior to both GRR and JRR estimators, using the mean squared error criterion. The conditions under which the MJR estimator is better than the other two competing estimators have been investigated.
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
Yuliana Yuliana
2010-06-01
Full Text Available Quantitative Electronic Structure Activity Relationship (QSAR analysis of a series of benzalacetones has been investigated based on semi empirical PM3 calculation data using Principal Components Regression (PCR. Investigation has been done based on antimutagen activity from benzalacetone compounds (presented by log 1/IC50 and was studied as linear correlation with latent variables (Tx resulted from transformation of atomic net charges using Principal Component Analysis (PCA. QSAR equation was determinated based on distribution of selected components and then was analysed with PCR. The result was described by the following QSAR equation : log 1/IC50 = 6.555 + (2.177.T1 + (2.284.T2 + (1.933.T3 The equation was significant on the 95% level with statistical parameters : n = 28 r = 0.766 SE = 0.245 Fcalculation/Ftable = 3.780 and gave the PRESS result 0.002. It means that there were only a relatively few deviations between the experimental and theoretical data of antimutagenic activity. New types of benzalacetone derivative compounds were designed and their theoretical activity were predicted based on the best QSAR equation. It was found that compounds number 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 42, 44, 47, 48, 49 and 50 have a relatively high antimutagenic activity. Keywords: QSAR; antimutagenic activity; benzalaceton; atomic net charge
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Kalkavan, Halime; Sharma, Piyush; Kasper, Stefan; Helfrich, Iris; Pandyra, Aleksandra A.; Gassa, Asmae; Virchow, Isabel; Flatz, Lukas; Brandenburg, Tim; Namineni, Sukumar; Heikenwalder, Mathias; Höchst, Bastian; Knolle, Percy A.; Wollmann, Guido; von Laer, Dorothee; Drexler, Ingo; Rathbun, Jessica; Cannon, Paula M.; Scheu, Stefanie; Bauer, Jens; Chauhan, Jagat; Häussinger, Dieter; Willimsky, Gerald; Löhning, Max; Schadendorf, Dirk; Brandau, Sven; Schuler, Martin; Lang, Philipp A.; Lang, Karl S.
2017-01-01
Immune-mediated effector molecules can limit cancer growth, but lack of sustained immune activation in the tumour microenvironment restricts antitumour immunity. New therapeutic approaches that induce a strong and prolonged immune activation would represent a major immunotherapeutic advance. Here we show that the arenaviruses lymphocytic choriomeningitis virus (LCMV) and the clinically used Junin virus vaccine (Candid#1) preferentially replicate in tumour cells in a variety of murine and human cancer models. Viral replication leads to prolonged local immune activation, rapid regression of localized and metastatic cancers, and long-term disease control. Mechanistically, LCMV induces antitumour immunity, which depends on the recruitment of interferon-producing Ly6C+ monocytes and additionally enhances tumour-specific CD8+ T cells. In comparison with other clinically evaluated oncolytic viruses and to PD-1 blockade, LCMV treatment shows promising antitumoural benefits. In conclusion, therapeutically administered arenavirus replicates in cancer cells and induces tumour regression by enhancing local immune responses. PMID:28248314
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Barndorff-Nielsen, Ole Eiler; Shephard, N.
2004-01-01
This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea
Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng
2011-11-01
SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
Analysing count data of Butterflies communities in Jasin, Melaka: A Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
Counting outcomes normally have remaining values highly skewed toward the right as they are often characterized by large values of zeros. The data of butterfly communities, had been taken from Jasin, Melaka and consists of 131 number of subject visits in Jasin, Melaka. In this paper, considering the count data of butterfly communities, an analysis is considered Poisson regression analysis as it is assumed to be an alternative way on better suited to the counting process. This research paper is about analysing count data from zero observation ecological inference of butterfly communities in Jasin, Melaka by using Poisson regression analysis. The software for Poisson regression is readily available and it is becoming more widely used in many field of research and the data was analysed by using SAS software. The purpose of analysis comprised the framework of identifying the concerns. Besides, by using Poisson regression analysis, the study determines the fitness of data for accessing the reliability on using the count data. The finding indicates that the highest and lowest number of subject comes from the third family (Nymphalidae) family and fifth (Hesperidae) family and the Poisson distribution seems to fit the zero values.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M
2015-01-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
Maarten van Smeden
2016-11-01
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Family background variables as instruments for education in income regressions: A Bayesian analysis
L.F. Hoogerheide (Lennart); J.H. Block (Jörn); A.R. Thurik (Roy)
2012-01-01
textabstractThe validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, N.; Vanacker, B.F.; Robertson, E.N.; Booij, L.H.D.J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent in
Schulz, Wolfram
Differences in student knowledge about democracy, institutions, and citizenship and students skills in interpreting political communication were studied through multilevel regression analysis of results from the second International Education Association (IEA) Study. This study provides data on 14-year-old students from 28 countries in Europe,…
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Declining Bias and Gender Wage Discrimination? A Meta-Regression Analysis
Jarrell, Stephen B.; Stanley, T. D.
2004-01-01
The meta-regression analysis reveals that there is a strong tendency for discrimination estimates to fall and wage discrimination exist against the woman. The biasing effect of researchers' gender of not correcting for selection bias has weakened and changes in labor market have made it less important.
2011-01-01
Atherosclerotic vascular disease, diabetes mellitus (DM) and dementia are major global health problems. Both endogenous and exogenous factors activate genes functioning in biological processes. This review article focuses on gene-activation mechanisms that regress atherosclerosis, eliminate DM type 2 (DM2), and prevent cognitive decline and dementia. Gene-activating compounds upregulating functions of liver endoplasmic reticulum (ER) and affecting lipid and protein metabolism, increase ER siz...
Diabetes mortality in Serbia, 1991-2015 (a nationwide study): A joinpoint regression analysis.
Ilic, Milena; Ilic, Irena
2017-02-01
The aim of this study was to analyze the mortality trends of diabetes mellitus in Serbia (excluding the Autonomous Province of Kosovo and Metohia). A population-based cross sectional study analyzing diabetes mortality in Serbia in the period 1991-2015 was carried out based on official data. The age-standardized mortality rates (per 100,000) were calculated by direct standardization, using the European Standard Population. Average annual percentage of change (AAPC) and the corresponding 95% confidence interval (CI) were computed using the joinpoint regression analysis. More than 63,000 (about 27,000 of men and 36,000 of women) diabetes deaths occurred in Serbia from 1991 to 2015. Death rates from diabetes were almost equal in men and in women (about 24.0 per 100,000) and places Serbia among the countries with the highest diabetes mortality rates in Europe. Since 1991, mortality from diabetes in men significantly increased by +1.2% per year (95% CI 0.7-1.7), but non-significantly increased in women by +0.2% per year (95% CI -0.4 to 0.7). Increased trends in diabetes type 1 mortality rates were significant in both genders in Serbia. Trends in mortality for diabetes type 2 showed a significant decrease in both genders since 2010. Given that diabetes mortality trends showed different patterns during the studied period, our results imply that further observation of trend is needed. Copyright © 2016 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.
The study of correlative factors to resistin and its regression equation in type 2 diabetes
无
2008-01-01
Objective To study the changes of resistin and the relationship between resistin and other indexes in patients with type 2 diabetes mellitus.Methods Seventy patients with type 2 diabetics were chosen and divided into three groups according to weight index and 15 healthy persons were chosen as controls.ELISA was adopted to determine resistin concentration.Oxydase method was adopted to measure blood sugar.Radio immunoassay was used to measure insulin level.Results Resistin concentration of patient groups [(23...
Analysis of designed experiments by stabilised PLS Regression and jack-knifing
Martens, Harald; Høy, M.; Westad, F.
2001-01-01
Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi...
Automatic regression analysis for use in a complex system of evaluation of plant genetic resources
Cs. ARKOSSY
1984-08-01
Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.
Greensmith, David J
2014-01-01
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow.
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
2016-04-01
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Cirulli, N; Ballini, A; Cantore, S; Farronato, D; Inchingolo, F; Dipalma, G; Gatto, M R; Alessandri Bonetti, G
2015-01-01
Mixed dentition analysis forms a critical aspect of early orthodontic treatment. In fact an accurate space analysis is one of the important criteria in determining whether the treatment plan may involve serial extraction, guidance of eruption, space maintenance, space regaining or just periodic observation of the patients. The aim of the present study was to calculate linear regression equations in mixed dentition space analysis, measuring 230 dental casts mesiodistal tooth widths, obtained from southern Italian patients (118 females, 112 males, mean age 15±3 years). Students t-test or Wilcoxon test for independent and paired samples were used to determine right/left side and male/female differences. On the basis of the sum of the mesiodistal diameters of the 4 mandibular incisors as predictors for the sum of the widths of the canines and premolars in the mandibular mixed dentition, a new linear regression equation was found: y = 0.613x+7.294 (r= 0.701) for both genders in a southern Italian population. To better estimate the size of leeway space, a new regression equation was found to calculate the mesiodistal size of the second premolar using the sum of the four mandibular incisors, canine and first premolar as a predictor. The equation is y = 0.241x+1.224 (r= 0.732). In conclusion, new regression equations were derived for a southern Italian population.
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Regression Analysis of Top of Descent Location for Idle-thrust Descents
Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg
2013-01-01
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.
Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
Laura K. Frank
2015-07-01
Full Text Available Reduced rank regression (RRR is an innovative technique to establish dietary patterns related to biochemical risk factors for type 2 diabetes, but has not been applied in sub-Saharan Africa. In a hospital-based case-control study for type 2 diabetes in Kumasi (diabetes cases, 538; controls, 668 dietary intake was assessed by a specific food frequency questionnaire. After random split of our study population, we derived a dietary pattern in the training set using RRR with adiponectin, HDL-cholesterol and triglycerides as responses and 35 food items as predictors. This pattern score was applied to the validation set, and its association with type 2 diabetes was examined by logistic regression. The dietary pattern was characterized by a high consumption of plantain, cassava, and garden egg, and a low intake of rice, juice, vegetable oil, eggs, chocolate drink, sweets, and red meat; the score correlated positively with serum triglycerides and negatively with adiponectin. The multivariate-adjusted odds ratio of type 2 diabetes for the highest quintile compared to the lowest was 4.43 (95% confidence interval: 1.87–10.50, p for trend < 0.001. The identified dietary pattern increases the odds of type 2 diabetes in urban Ghanaians, which is mainly attributed to increased serum triglycerides.
Frank, Laura K; Jannasch, Franziska; Kröger, Janine; Bedu-Addo, George; Mockenhaupt, Frank P; Schulze, Matthias B; Danquah, Ina
2015-07-07
Reduced rank regression (RRR) is an innovative technique to establish dietary patterns related to biochemical risk factors for type 2 diabetes, but has not been applied in sub-Saharan Africa. In a hospital-based case-control study for type 2 diabetes in Kumasi (diabetes cases, 538; controls, 668) dietary intake was assessed by a specific food frequency questionnaire. After random split of our study population, we derived a dietary pattern in the training set using RRR with adiponectin, HDL-cholesterol and triglycerides as responses and 35 food items as predictors. This pattern score was applied to the validation set, and its association with type 2 diabetes was examined by logistic regression. The dietary pattern was characterized by a high consumption of plantain, cassava, and garden egg, and a low intake of rice, juice, vegetable oil, eggs, chocolate drink, sweets, and red meat; the score correlated positively with serum triglycerides and negatively with adiponectin. The multivariate-adjusted odds ratio of type 2 diabetes for the highest quintile compared to the lowest was 4.43 (95% confidence interval: 1.87-10.50, p for trend dietary pattern increases the odds of type 2 diabetes in urban Ghanaians, which is mainly attributed to increased serum triglycerides.
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Zuccoli, G.; Ferrozzi, F.; Bassi, P. [Department of Radiology, University of Parma, V. Gramsci, 14, I-43100 Parma (Italy); Sigorini, M.; Virdis, R. [Department of Paediatrics, University of Parma, V. Gramsci, 14, I-43100 Parma (Italy); Bellomi, M. [Division of Radiology, European Institute of Oncology, Milan (Italy)
2000-07-01
A patient with neurofibromatosis type 1 was found to have an enhancing mass in the hypothalamus and in the anterior optic pathway. A 3-month MR study showed a reduction in the size and enhancement of the mass. At a 9-month MR follow-up the mass disappeared and ceased to enhance. This report shows the unusual behaviour of a hypothalamic/chiasmatic mass confirming that in such asymptomatic cases the conservative management can be considered the treatment of choice. (orig.)
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Tejera-Vaquerizo, A; Martín-Cuevas, P; Gallego, E; Herrera-Acosta, E; Traves, V; Herrera-Ceballos, E; Nagore, E
2015-04-01
The main aim of this study was to identify predictors of sentinel lymph node (SN) metastasis in cutaneous melanoma. This was a retrospective cohort study of 818 patients in 2 tertiary-level hospitals. The primary outcome variable was SN involvement. Independent predictors were identified using multiple logistic regression and a classification and regression tree (CART) analysis. Ulceration, tumor thickness, and a high mitotic rate (≥6 mitoses/mm(2)) were independently associated with SN metastasis in the multiple regression analysis. The most important predictor in the CART analysis was Breslow thickness. Absence of an inflammatory infiltrate, patient age, and tumor location were predictive of SN metastasis in patients with tumors thicker than 2mm. In the case of thinner melanomas, the predictors were mitotic rate (>6 mitoses/mm(2)), presence of ulceration, and tumor thickness. Patient age, mitotic rate, and tumor thickness and location were predictive of survival. A high mitotic rate predicts a higher risk of SN involvement and worse survival. CART analysis improves the prediction of regional metastasis, resulting in better clinical management of melanoma patients. It may also help select suitable candidates for inclusion in clinical trials. Copyright © 2014 Elsevier España, S.L.U. and AEDV. All rights reserved.
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions
Catalin Angelo Ioan
2011-08-01
Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.
Multiple Regression Analysis of Unconfined Compression Strength of Mine Tailings Matrices
Mahmood Ali A.
2017-01-01
Full Text Available As part of a novel approach of sustainable development of mine tailings, experimental and numerical analysis is carried out on newly formulated tailings matrices. Several physical characteristic tests are carried out including the unconfined compression strength test to ascertain the integrity of these matrices when subjected to loading. The current paper attempts a multiple regression analysis of the unconfined compressive strength test results of these matrices to investigate the most pertinent factors affecting their strength. Results of this analysis showed that the suggested equation is reasonably applicable to the range of binder combinations used.
Milena Ilic
Full Text Available BACKGROUND: Limited data on mortality from malignant lymphatic and hematopoietic neoplasms have been published for Serbia. METHODS: The study covered population of Serbia during the 1991-2010 period. Mortality trends were assessed using the joinpoint regression analysis. RESULTS: Trend for overall death rates from malignant lymphoid and haematopoietic neoplasms significantly decreased: by -2.16% per year from 1991 through 1998, and then significantly increased by +2.20% per year for the 1998-2010 period. The growth during the entire period was on average +0.8% per year (95% CI 0.3 to 1.3. Mortality was higher among males than among females in all age groups. According to the comparability test, mortality trends from malignant lymphoid and haematopoietic neoplasms in men and women were parallel (final selected model failed to reject parallelism, P = 0.232. Among younger Serbian population (0-44 years old in both sexes: trends significantly declined in males for the entire period, while in females 15-44 years of age mortality rates significantly declined only from 2003 onwards. Mortality trend significantly increased in elderly in both genders (by +1.7% in males and +1.5% in females in the 60-69 age group, and +3.8% in males and +3.6% in females in the 70+ age group. According to the comparability test, mortality trend for Hodgkin's lymphoma differed significantly from mortality trends for all other types of malignant lymphoid and haematopoietic neoplasms (P<0.05. CONCLUSION: Unfavourable mortality trend in Serbia requires targeted intervention for risk factors control, early diagnosis and modern therapy.
Hüseyin BUDAK
2012-11-01
Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction.
Regression analysis of growth responses to water depth in three wetland plant species
Sorrell, Brian K; Tanner, Chris C; Brix, Hans
2012-01-01
) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water...... depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth...... differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis...
Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa
Otoo
2015-08-01
Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.
Regression Analysis of Right-censored Failure Time Data with Missing Censoring Indicators
Ping Chen; Ren He; Jun-shan Shen; Jian-guo Sun
2009-01-01
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan[4] considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
[Regression analysis of an instrumental conditioned tentacular reflex in the edible snail].
Stepanov, I I; Kuntsevich, S V; Lokhov, M I
1989-01-01
Regression analysis revealed the opportunity of approximation with exponential mathematical model of the learning curves of conditioned tentacle reflex. Retention of the reflex persisted for more than three weeks. There were some quantitative differences between conditioning of the right and the left tentacle. There was formation of the reflex in every session during spring period, but there was no retention between sessions. The conditioned tentacle reflex may be employed in neuropharmacological studies.
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.
Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis
Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei
2016-10-01
Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Wei, Jiawei
2012-12-04
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression
Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.
2015-12-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second
Hunter, Paul R
2009-12-01
Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months ceramic filters were likely to be still effective at reducing disease, whereas SODIS, chlorination, and coagulation-chlorination had little if any benefit. Indeed these three interventions are predicted to have the same or less effect than what may be expected due purely to reporting bias in unblinded studies With the currently available evidence ceramic filters are the most effective form of HWT in the longterm, disinfection-only interventions including SODIS appear to have poor if any longterm public health benefit.
Deni Memić
2015-01-01
Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.
Correlation Study and Regression Analysis of Drinking Water Quality in Kashan City, Iran
Mohammad Mehdi HEYDARI
2013-06-01
Full Text Available Chemical and statistical regression analysis on drinking water samples at five fields (21 sampling wells with hot and dry climate in Kashan city, central Iran was carried out. Samples were collected during October 2006 to May 2007 (25 - 30 °C. Comparing the results with drinking water quality standards issued by World Health Organization (WHO, it is found that some of the water samples are not potable. Hydrochemical facies using a Piper diagram indicate that in most parts of the city, the chemical character of water is dominated by NaCl. All samples showed sulfate and sodium ion higher and K+ and F- content lower than the permissible limit. A strongly positive correlation is observed between TDS and EC (R = 0.995 and Ca2+ and TH (R = 0.948. The results showed that regression relations have the same correlation coefficients: (I pH -TH, EC -TH (R = 0.520, (II NO3- -pH, TH-pH (R = 0.520, (III Ca2+-SO42-, TH-SO42-, Cl- -SO42- (R = 0.630. The results revealed that systematic calculations of correlation coefficients between water parameters and regression analysis provide a useful means for rapid monitoring of water quality.
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
THE PROGNOSIS OF RUSSIAN DEFENSE INDUSTRY DEVELOPMENT IMPLEMENTED THROUGH REGRESSION ANALYSIS
L.M. Kapustina
2007-03-01
Full Text Available The article illustrates the results of investigation the major internal and external factors which influence the development of the defense industry, as well as the results of regression analysis which quantitatively displays the factorial contribution in the growth rate of Russian defense industry. On the basis of calculated regression dependences the authors fulfilled the medium-term prognosis of defense industry. Optimistic and inertial versions of defense product growth rate for the period up to 2009 are based on scenario conditions in Russian economy worked out by the Ministry of economy and development. In conclusion authors point out which factors and conditions have the largest impact on successful and stable operation of Russian defense industry.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
Espino, Natalia V.
Foreign Object Debris/Damage (FOD) is a costly and high-risk problem that aeronautics industries such as Boeing, Lockheed Martin, among others are facing at their production lines every day. They spend an average of $350 thousand dollars per year fixing FOD problems. FOD can put pilots, passengers and other crews' lives into high-risk. FOD refers to any type of foreign object, particle, debris or agent in the manufacturing environment, which could contaminate/damage the product or otherwise undermine quality control standards. FOD can be in the form of any of the following categories: panstock, manufacturing debris, tools/shop aids, consumables and trash. Although aeronautics industries have put many prevention plans in place such as housekeeping and "clean as you go" philosophies, trainings, use of RFID for tooling control, etc. none of them has been able to completely eradicate the problem. This research presents a logistic regression statistical model approach to predict probability of FOD type under given specific circumstances such as workstation, month and aircraft/jet being built. FOD Quality Assurance Reports of the last three years were provided by an aeronautical industry for this study. By predicting type of FOD, custom reduction/elimination plans can be put in place and by such means being able to diminish the problem. Different aircrafts were analyzed and so different models developed through same methodology. Results of the study presented are predictions of FOD type for each aircraft and workstation throughout the year, which were obtained by applying proposed logistic regression models. This research would help aeronautic industries to address the FOD problem correctly, to be able to identify root causes and establish actual reduction/elimination plans.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.
Guo, Wei; Elston, Robert C; Zhu, Xiaofeng
2011-11-29
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
Wilson, Lauren; Bhatnagar, Prachi; Townsend, Nick
2017-07-01
We aimed to study the time trends underlying a change from cardiovascular disease (CVD) to cancer as the most common cause of age-standardized mortality in the UK between 1983 and 2013. A retrospective trend analysis of the World Health Organization mortality database for mortality from all cancers, all CVDs, and their three most common types, by sex and age. Age-standardized mortality rates were adjusted to the 2013 European Standard Population and analyzed using joinpoint regression analysis for annual percent changes. The difference in mortality rate between total CVD and cancer narrowed over the study period as age-standardized mortality from CVD decreased more steeply than cancer in both sexes. We observed higher overall rates for both diseases in men compared to women, with high mortality rates from ischemic heart disease and lung cancer in men. Joinpoint regression analysis indicated that trends of decreasing rates of CVD have increased over time while decreasing trends in cancer mortality rates have slowed down since the 1990s. The lowest improvements in mortality rates were for cancer in those over 75 years of age and lung cancer in women. In 2011, the age-standardized mortality rate for cancer exceeded that of CVD in both sexes in the UK. These changing trends in mortality may support evidence for changes in policy and resource allocation in the UK.
Javali Shivalingappa
2010-01-01
Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Gunay, Ahmet [Deparment of Environmental Engineering, Faculty of Engineering and Architecture, Balikesir University (Turkey)], E-mail: ahmetgunay2@gmail.com
2007-09-30
The experimental data of ammonium exchange by natural Bigadic clinoptilolite was evaluated using nonlinear regression analysis. Three two-parameters isotherm models (Langmuir, Freundlich and Temkin) and three three-parameters isotherm models (Redlich-Peterson, Sips and Khan) were used to analyse the equilibrium data. Fitting of isotherm models was determined using values of standard normalization error procedure (SNE) and coefficient of determination (R{sup 2}). HYBRID error function provided lowest sum of normalized error and Khan model had better performance for modeling the equilibrium data. Thermodynamic investigation indicated that ammonium removal by clinoptilolite was favorable at lower temperatures and exothermic in nature.
Zhigao Zeng
2016-01-01
Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, Niel; Vanacker, B. F.; Robertson, E. N.; Booij, L. H. D. J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65–5.73) for doubling the mi...
Mehdi Najafi; Seyed Mohammad Esmaiel Jalali; Reza KhaloKakaie; Farrokh Forouhandeh
2015-01-01
During underground coal gasification (UCG), whereby coal is converted to syngas in situ, a cavity is formed in the coal seam. The cavity growth rate (CGR) or the moving rate of the gasification face is affected by controllable (operation pressure, gasification time, geometry of UCG panel) and uncontrollable (coal seam properties) factors. The CGR is usually predicted by mathematical models and laboratory experiments, which are time consuming, cumbersome and expensive. In this paper, a new simple model for CGR is developed using non-linear regression analysis, based on data from 11 UCG field trials. The empirical model compares satisfactorily with Perkins model and can reliably predict CGR.
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise...
Regression analysis as an objective tool of economic management of rolling mill
Š. Vilamová
2015-07-01
Full Text Available The ability to optimize costs plays a key role in maintaining competitiveness of the company, because without detailed knowledge of costs, companies are not able to make the right decisions that will ensure their long-term growth. The aim of this article is to outline the problematic areas related to company costs and to contribute to a debate on the method used to determine the amount of fixed and variable costs, their monitoring and follow-up control. This article presents a potential use of regression analysis as an objective tool of economic management in metallurgical companies, as these companies have several specific features
Kinnebrock, Silja; Podolskij, Mark
and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise......This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...
Estimating the causes of traffic accidents using logistic regression and discriminant analysis.
Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu
2014-01-01
Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放
2003-01-01
Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of
Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.
2017-07-01
This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Olivia Prazeres da Costa
Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Ya-Nan Ma
Full Text Available BACKGROUND: There have been few published studies on spirometric reference values for healthy children in China. We hypothesize that there would have been changes in lung function that would not have been precisely predicted by the existing spirometric reference equations. The objective of the study was to develop more accurate predictive equations for spirometric reference values for children aged 9 to 15 years in Northeast China. METHODOLOGY/PRINCIPAL FINDINGS: Spirometric measurements were obtained from 3,922 children, including 1,974 boys and 1,948 girls, who were randomly selected from five cities of Liaoning province, Northeast China, using the ATS (American Thoracic Society and ERS (European Respiratory Society standards. The data was then randomly split into a training subset containing 2078 cases and a validation subset containing 1844 cases. Predictive equations used multiple linear regression techniques with three predictor variables: height, age and weight. Model goodness of fit was examined using the coefficient of determination or the R(2 and adjusted R(2. The predicted values were compared with those obtained from the existing spirometric reference equations. The results showed the prediction equations using linear regression analysis performed well for most spirometric parameters. Paired t-tests were used to compare the predicted values obtained from the developed and existing spirometric reference equations based on the validation subset. The t-test for males was not statistically significant (p>0.01. The predictive accuracy of the developed equations was higher than the existing equations and the predictive ability of the model was also validated. CONCLUSION/SIGNIFICANCE: We developed prediction equations using linear regression analysis of spirometric parameters for children aged 9-15 years in Northeast China. These equations represent the first attempt at predicting lung function for Chinese children following the ATS
The analysis of internet addiction scale using multivariate adaptive regression splines.
Kayri, M
2010-01-01
Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions. In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT) findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS), which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data). MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS. MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet-use, grade of students and occupations of mothers had a significant effect (Pdependency level prediction. The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Julia Gasch
Full Text Available BACKGROUND: Differences in spontaneous and drug-induced baroreflex sensitivity (BRS have been attributed to its different operating ranges. The current study attempted to compare BRS estimates during cardiovascular steady-state and pharmacologically stimulation using an innovative algorithm for dynamic determination of baroreflex gain. METHODOLOGY/PRINCIPAL FINDINGS: Forty-five volunteers underwent the modified Oxford maneuver in supine and 60° tilted position with blood pressure and heart rate being continuously recorded. Drug-induced BRS-estimates were calculated from data obtained by bolus injections of nitroprusside and phenylephrine. Spontaneous indices were derived from data obtained during rest (stationary and under pharmacological stimulation (non-stationary using the algorithm of trigonometric regressive spectral analysis (TRS. Spontaneous and drug-induced BRS values were significantly correlated and display directionally similar changes under different situations. Using the Bland-Altman method, systematic differences between spontaneous and drug-induced estimates were found and revealed that the discrepancy can be as large as the gain itself. Fixed bias was not evident with ordinary least products regression. The correlation and agreement between the estimates increased significantly when BRS was calculated by TRS in non-stationary mode during the drug injection period. TRS-BRS significantly increased during phenylephrine and decreased under nitroprusside. CONCLUSIONS/SIGNIFICANCE: The TRS analysis provides a reliable, non-invasive assessment of human BRS not only under static steady state conditions, but also during pharmacological perturbation of the cardiovascular system.
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.
2000-01-01
The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA...... of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
Păniţă, Ovidiu
2015-09-01
In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.
A frailty model approach for regression analysis of multivariate current status data.
Chen, Man-Hua; Tong, Xingwei; Sun, Jianguo
2009-11-30
This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Evaluation of Visual Field Progression in Glaucoma: Quasar Regression Program and Event Analysis.
Díaz-Alemán, Valentín T; González-Hernández, Marta; Perera-Sanz, Daniel; Armas-Domínguez, Karintia
2016-01-01
To determine the sensitivity, specificity and agreement between the Quasar program, glaucoma progression analysis (GPA II) event analysis and expert opinion in the detection of glaucomatous progression. The Quasar program is based on linear regression analysis of both mean defect (MD) and pattern standard deviation (PSD). Each series of visual fields was evaluated by three methods; Quasar, GPA II and four experts. The sensitivity, specificity and agreement (kappa) for each method was calculated, using expert opinion as the reference standard. The study included 439 SITA Standard visual fields of 56 eyes of 42 patients, with a mean of 7.8 ± 0.8 visual fields per eye. When suspected cases of progression were considered stable, sensitivity and specificity of Quasar, GPA II and the experts were 86.6% and 70.7%, 26.6% and 95.1%, and 86.6% and 92.6% respectively. When suspected cases of progression were considered as progressing, sensitivity and specificity of Quasar, GPA II and the experts were 79.1% and 81.2%, 45.8% and 90.6%, and 85.4% and 90.6% respectively. The agreement between Quasar and GPA II when suspected cases were considered stable or progressing was 0.03 and 0.28 respectively. The degree of agreement between Quasar and the experts when suspected cases were considered stable or progressing was 0.472 and 0.507. The degree of agreement between GPA II and the experts when suspected cases were considered stable or progressing was 0.262 and 0.342. The combination of MD and PSD regression analysis in the Quasar program showed better agreement with the experts and higher sensitivity than GPA II.
Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José
2017-03-17
Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Parallel Approach for Time Series Analysis with General Regression Neural Networks
J.C. Cuevas-Tello
2012-04-01
Full Text Available The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm for time delay estimation. The neural network architecture that we use is general Regression Neural Network (GRNN. For the parallel approach, we use Message Passing Interface (MPI on a beowulf-type cluster and on a Cray supercomputer and we also use the Compute Unified Device Architecture (CUDA™ language on Graphics Processing Units (GPUs. We demonstrate that, with our approach, fast algorithms can be obtained for time delay estimation on large data sets with the same accuracy as state-of-the-art methods.
Regression analysis based on conditional likelihood approach under semi-competing risks data.
Hsieh, Jin-Jian; Huang, Yu-Ting
2012-07-01
Medical studies often involve semi-competing risks data, which consist of two types of events, namely terminal event and non-terminal event. Because the non-terminal event may be dependently censored by the terminal event, it is not possible to make inference on the non-terminal event without extra assumptions. Therefore, this study assumes that the dependence structure on the non-terminal event and the terminal event follows a copula model, and lets the marginal regression models of the non-terminal event and the terminal event both follow time-varying effect models. This study uses a conditional likelihood approach to estimate the time-varying coefficient of the non-terminal event, and proves the large sample properties of the proposed estimator. Simulation studies show that the proposed estimator performs well. This study also uses the proposed method to analyze AIDS Clinical Trial Group (ACTG 320).
Askelöf, P; Korsfeldt, M; Mannervik, B
1976-10-01
Knowledge of the error structure of a given set of experimental data is a necessary prerequisite for incisive analysis and for discrimination between alternative mathematical models of the data set. A reaction system consisting of glutathione S-transferase A (glutathione S-aryltransferase), glutathione, and 3,4-dichloro-1-nitrobenzene was investigated under steady-state conditions. It was found that the experimental error increased with initial velocity, v, and that the variance (estimated by replicates) could be described by a polynomial in v Var (v) = K0 + K1 - v + K2 - v2 or by a power function Var (v) = K0 + K1 - vK2. These equations were good approximations irrespective of whether different v values were generated by changing substrate or enzyme concentrations. The selection of these models was based mainly on experiments involving varying enzyme concentration, which, unlike v, is not considered a stochastic variable. Different models of the variance, expressed as functions of enzyme concentration, were examined by regression analysis, and the models could then be transformed to functions in which velocity is substituted for enzyme concentration owing to the proportionality between these variables. Thus, neither the absolute nor the relative error was independent of velocity, a result previously obtained for glutathione reductase in this laboratory [BioSystems 7, 101-119 (1975)]. If the experimental errors or velocities were standardized by division with their corresponding mean velocity value they showed a normal (Gaussian) distribution provided that the coefficient of variation was approximately constant for the data considered. Furthermore, it was established that the errors in the independent variables (enzyme and substrate concentrations) were small in comparison with the error in the velocity determinations. For weighting in regression analysis the inverted value of the local variance in each experimental point should be used. It was found that the
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis.
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making.
Stature estimation from footprint measurements in Indian Tamils by regression analysis
T. Nataraja Moorthy
2014-03-01
Full Text Available Stature estimation is of particular interest to forensic scientists for its importance in human identification. Footprint is one piece of valuable physical evidence encountered at crime scenes and its identification can facilitate narrowing down the suspects and establishing the identity of the criminals. Analysis of footprints helps in estimation of an individual’s stature because of the existence of the strong correlation between footprint and height. Foot impressions are still found at crime scenes, since offenders often tend to remove their footwear either to avoid noise or to gain a better grip in climbing walls, etc., while entering or exiting. In Asian countries like India, there are people who still have the habit of walking barefoot. The present study aims to estimate the stature in a sample of 2,040 bilateral footprints collected from 1,020 healthy adult male Indian Tamils, an ethnic group in Tamilnadu State, India, who consented to participate in the study and who range in age from 19 to 42 years old; this study will help to generate population-specific equations using a simple linear regression statistical method. All footprint lengths exhibit a statistically positive significant correlation with stature (p-value < 0.01 and the correlation coefficient (r ranges from 0.546 to 0.578. The accuracy of the regression equations was verified by comparing the estimated stature with the actual stature. Regression equations derived in this research can be used to estimate stature from the complete or even partial footprints among Indian Tamils.
Fu, Yuan-Yuan; Wang, Ji-Hua; Yang, Gui-Jun; Song, Xiao-Yu; Xu, Xin-Gang; Feng, Hai-Kuan
2013-05-01
The major limitation of using existing vegetation indices for crop biomass estimation is that it approaches a saturation level asymptotically for a certain range of biomass. In order to resolve this problem, band depth analysis and partial least square regression (PLSR) were combined to establish winter wheat biomass estimation model in the present study. The models based on the combination of band depth analysis and PLSR were compared with the models based on common vegetation indexes from the point of view of estimation accuracy, subsequently. Band depth analysis was conducted in the visible spectral domain (550-750 nm). Band depth, band depth ratio (BDR), normalized band depth index, and band depth normalized to area were utilized to represent band depth information. Among the calibrated estimation models, the models based on the combination of band depth analysis and PLSR reached higher accuracy than those based on the vegetation indices. Among them, the combination of BDR and PLSR got the highest accuracy (R2 = 0.792, RMSE = 0.164 kg x m(-2)). The results indicated that the combination of band depth analysis and PLSR could well overcome the saturation problem and improve the biomass estimation accuracy when winter wheat biomass is large.
Varga Csaba
2012-10-01
Full Text Available Abstract Background Identifying risk factors for Salmonella Enteritidis (SE infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68 and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94, after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors.
Shuai Wang
2014-10-01
Full Text Available Accurate prediction of the remaining useful life (RUL of lithium-ion batteries is important for battery management systems. Traditional empirical data-driven approaches for RUL prediction usually require multidimensional physical characteristics including the current, voltage, usage duration, battery temperature, and ambient temperature. From a capacity fading analysis of lithium-ion batteries, it is found that the energy efficiency and battery working temperature are closely related to the capacity degradation, which account for all performance metrics of lithium-ion batteries with regard to the RUL and the relationships between some performance metrics. Thus, we devise a non-iterative prediction model based on flexible support vector regression (F-SVR and an iterative multi-step prediction model based on support vector regression (SVR using the energy efficiency and battery working temperature as input physical characteristics. The experimental results show that the proposed prognostic models have high prediction accuracy by using fewer dimensions for the input data than the traditional empirical models.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Fengfeng Wang
2014-01-01
Full Text Available Background. MicroRNA (miRNA is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC, and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI and chromosomal instability (CIN signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC.
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Abdul Ghafoor Memon
2014-03-01
Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis
Chang, C. H.; Chan, H. C.; Chen, B. A.
2016-12-01
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Yu, Rongqin; Geddes, John R; Fazel, Seena
2012-10-01
The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not.
A quantile regression approach to the analysis of the quality of life determinants in the elderly
Serena Broccoli
2013-05-01
Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.
Luoma, P V
2011-07-01
Atherosclerotic vascular disease, diabetes mellitus (DM) and dementia are major global health problems. Both endogenous and exogenous factors activate genes functioning in biological processes. This review article focuses on gene-activation mechanisms that regress atherosclerosis, eliminate DM type 2 (DM2), and prevent cognitive decline and dementia. Gene-activating compounds upregulating functions of liver endoplasmic reticulum (ER) and affecting lipid and protein metabolism, increase ER size through membrane synthesis, and produce an antiatherogenic plasma lipoprotein profile. Numerous gene-activators regress atherosclerosis and reduce the occurrence of atherosclerotic disease. The gene-activators increase glucose disposal rate and insulin sensitivity and, by restoring normal glucose and insulin levels, remove metabolic syndrome and DM2. Patients with DM2 show an improvement of plasma lipoprotein profile and glucose tolerance together with increase in liver phospholipid (PL) and cytochrome (CYP) P450. The gene-activating compounds induce hepatic protein and PL synthesis, and upregulate enzymes including CYPs and glucokinase, nuclear receptors, apolipoproteins and ABC (ATP-binding cassette) transporters. They induce reparation of ER structures and eliminate consequences of ER stress. Healthy living habits activate mechanisms that maintain high levels of HDL and apolipoprotein AI, promote health, and prevent cognitive decline and dementia. Agonists of liver X receptor (LXR) reduce amyloid in brain plaques and improve cognitive performance in mouse models of Alzheimer's disease. The gene activation increases the capacity to withstand cellular stress and to repair cellular damage and increases life span. Life free of major health problems and in good cognitive health promotes well-being and living a long and active life.
Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis.
Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A
2009-12-01
There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de'Aubigne score (MDS), Harris hip score and Tonnis angle, change in lateral centre edge (LCE) angle, late proximal femoral osteotomies, and heterotrophic ossification (HO) resection. Multivariate analysis showed that the odds of having THA increases with grade 2/grade 3 osteoarthritis (3.36 times), joint penetration (3.12 times), low preoperative MDS (1.59 times), late PFO (1.59 times), presence of preoperative subluxation (1.22 times), previous hip operations (1.14 times), and concomitant PFO (1.09 times). In the absence of randomised controlled studies, the findings of this analysis can help the surgeon to make treatment decisions.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Boruah, Deb K; Dhingani, Dhaval D; Achar, Sashidhar; Prakash, Arjun; Augustine, Antony; Sanyal, Shantiranjan; Gogoi, Manoj; Mahanta, Kangkana
2016-01-01
Objective: The aim of this study was to evaluate the magnetic resonance imaging (MRI) findings of caudal regression syndrome (CRS) and concomitant anomalies in pediatric patients. Materials and Methods: A hospital-based cross-sectional retrospective study was conducted. The study group comprised 21 pediatric patients presenting to the Departments of Radiodiagnosis and Pediatric Surgery in a tertiary care hospital from May 2011 to April 2016. All patients were initially evaluated clinically followed by MRI. Results: In our study, 21 pediatric patients were diagnosed with sacral agenesis/dysgenesis related to CRS. According to the Pang's classification, 2 (9.5%) patients were Type I, 5 (23.8%) patients were Type III, 7 (33.3%) patients were Type IV, and 7 (33.3%) patients were of Type V CRS. Clinically, 17 (81%) patients presented with urinary incontinence, 6 (28.6%) with fecal incontinence, 9 patients (42.9%) had poor gluteal musculatures and shallow intergluteal cleft, 7 (33.3%) patients had associated subcutaneous mass over spine, and 6 (28.6%) patients presented with distal leg muscle atrophy. MRI showed wedge-shaped conus termination in 5 (23.8%) patients and bulbous conus termination in 3 (14.3%) patients above the L1 vertebral level falling into Group 1 CRS while 7 (33.3%) patients had tethered cord and 6 (28.6%) patients had stretched conus falling into Group 2 CRS. Conclusion: MRI is the ideal modality for detailed evaluation of the status of the vertebra, spinal cord, intra- and extra-dural lesions and helps in early diagnosis, detailed preoperative MRI evaluation and assessing concomitant anomalies and guiding further management with early institution of treatment to maximize recovery. PMID:27833778
Wu, X. B.
2006-06-01
Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.
Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis
Hossam E. Hosny
2015-07-01
Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.
Biological stability in drinking water: a regression analysis of influencing factors
LU Wei; ZHANG Xiao-jian
2005-01-01
Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 tμg/L, the biological stability of drinking water can be controlled.Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.
Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran
Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI
2012-01-01
We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS
Long Cheng
2016-07-01
Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.
A generalized Defries-Fulker regression framework for the analysis of twin data.
Lazzeroni, Laura C; Ray, Amrita
2013-01-01
Twin studies compare the similarity between monozygotic twins to that between dizygotic twins in order to investigate the relative contributions of latent genetic and environmental factors influencing a phenotype. Statistical methods for twin data include likelihood estimation and Defries-Fulker regression. We propose a new generalization of the Defries-Fulker model that fully incorporates the effects of observed covariates on both members of a twin pair and is robust to violations of the Normality assumption. A simulation study demonstrates that the method is competitive with likelihood analysis. The Defries-Fulker strategy yields new insight into the parameter space of the twin model and provides a novel, prediction-based interpretation of twin study results that unifies continuous and binary traits. Due to the simplicity of its structure, extensions of the model have the potential to encompass generalized linear models, censored and truncated data; and gene by environment interactions.
牛东晓; 刘达; 邢棉
2008-01-01
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model
Jassim N. Hussain
2008-01-01
Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.
Melanin and blood concentration in human skin studied by multiple regression analysis: experiments
Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.
2001-09-01
Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.
Shen, Chung-Wei; Chen, Yi-Hau
2015-10-01
Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.
Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;
2016-01-01
process operating at Novozymes A/S. Following the FUPCR methodology, the final product concentration could be predicted with an average prediction error of 7.4%. Multiple iterations of preprocessing were applied by implementing the methodology to identify the best data handling methods for the model....... It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production...
A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data
Gazioglu, Suzan
2013-05-25
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.
Evaluating Non-Linear Regression Models in Analysis of Persian Walnut Fruit Growth
I. Karamatlou
2016-02-01
Full Text Available Introduction: Persian walnut (Juglans regia L. is a large, wind-pollinated, monoecious, dichogamous, long lived, perennial tree cultivated for its high quality wood and nuts throughout the temperate regions of the world. Growth model methodology has been widely used in the modeling of plant growth. Mathematical models are important tools to study the plant growth and agricultural systems. These models can be applied for decision-making anddesigning management procedures in horticulture. Through growth analysis, planning for planting systems, fertilization, pruning operations, harvest time as well as obtaining economical yield can be more accessible.Non-linear models are more difficult to specify and estimate than linear models. This research was aimed to studynon-linear regression models based on data obtained from fruit weight, length and width. Selecting the best models which explain that fruit inherent growth pattern of Persian walnut was a further goal of this study. Materials and Methods: The experimental material comprising 14 Persian walnut genotypes propagated by seed collected from a walnut orchard in Golestan province, Minoudasht region, Iran, at latitude 37◦04’N; longitude 55◦32’E; altitude 1060 m, in a silt loam soil type. These genotypes were selected as a representative sampling of the many walnut genotypes available throughout the Northeastern Iran. The age range of walnut trees was 30 to 50 years. The annual mean temperature at the location is16.3◦C, with annual mean rainfall of 690 mm.The data used here is the average of walnut fresh fruit and measured withgram/millimeter/day in2011.According to the data distribution pattern, several equations have been proposed to describesigmoidal growth patterns. Here, we used double-sigmoid and logistic–monomolecular models to evaluate fruit growth based on fruit weight and4different regression models in cluding Richards, Gompertz, Logistic and Exponential growth for evaluation
Chang-zhi CHENG
2011-06-01
Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross
Josef Smolle
2001-01-01
Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.
Koizumi, Itsuro; Yamamoto, Shoichiro; Maekawa, Koji
2006-10-01
Isolation by distance is usually tested by the correlation of genetic and geographic distances separating all pairwise populations' combinations. However, this method can be significantly biased by only a few highly diverged populations and lose the information of individual population. To detect outlier populations and investigate the relative strengths of gene flow and genetic drift for each population, we propose a decomposed pairwise regression analysis. This analysis was applied to the well-described one-dimensional stepping-stone system of stream-dwelling Dolly Varden charr (Salvelinus malma). When genetic and geographic distances were plotted for all pairs of 17 tributary populations, the correlation was significant but weak (r(2) = 0.184). Seven outlier populations were determined based on the systematic bias of the regression residuals, followed by Akaike's information criteria. The best model, 10 populations included, showed a strong pattern of isolation by distance (r(2) = 0.758), suggesting equilibrium between gene flow and genetic drift in these populations. Each outlier population was also analysed by plotting pairwise genetic and geographic distances against the 10 nonoutlier populations, and categorized into one of the three patterns: strong genetic drift, genetic drift with a limited gene flow and a high level of gene flow. These classifications were generally consistent with a priori predictions for each population (physical barrier, population size, anthropogenic impacts). Combined the genetic analysis with field observations, Dolly Varden in this river appeared to form a mainland-island or source-sink metapopulation structure. The generality of the method will merit many types of spatial genetic analyses.
Ermarth, Anna; Bryce, Matthew; Woodward, Stephanie; Stoddard, Gregory; Book, Linda; Jensen, M Kyle
2017-03-01
Celiac disease is detected using serology and endoscopy analyses. We used multiple statistical analyses of a geographically isolated population in the United States to determine whether a single serum screening can identify individuals with celiac disease. We performed a retrospective study of 3555 pediatric patients (18 years old or younger) in the intermountain West region of the United States from January 1, 2008, through September 30, 2013. All patients had undergone serologic analyses for celiac disease, including measurement of antibodies to tissue transglutaminase (TTG) and/or deamidated gliadin peptide (DGP), and had duodenal biopsies collected within the following year. Modified Marsh criteria were used to identify patients with celiac disease. We developed models to identify patients with celiac disease using logistic regression and classification and regression tree (CART) analysis. Single use of a test for serum level of IgA against TTG identified patients with celiac disease with 90% sensitivity, 90% specificity, a 61% positive predictive value (PPV), a 90% negative predictive value, and an area under the receiver operating characteristic curve value of 0.91; these values were higher than those obtained from assays for IgA against DGP or IgG against TTG plus DGP. Not including the test for DGP antibody caused only 0.18% of celiac disease cases to be missed. Level of TTG IgA 7-fold the upper limit of normal (ULN) identified patients with celiac disease with a 96% PPV and 100% specificity. Using CART analysis, we found a level of TTG IgA 3.2-fold the ULN and higher to most accurately identify patients with celiac disease (PPV, 89%). Multivariable CART analysis showed that a level of TTG IgA 2.5-fold the ULN and higher was sufficient to identify celiac disease in patients with type 1 diabetes (PPV, 88%). Serum level of IgA against TTG in patients with versus those without trisomy 21 did not affect diagnosis predictability in CART analysis. In a population
Genetic analysis of somatic cell score in Norwegian cattle using random regression test-day models.
Odegård, J; Jensen, J; Klemetsdal, G; Madsen, P; Heringstad, B
2003-12-01
The dataset used in this analysis contained a total of 341,736 test-day observations of somatic cell scores from 77,110 primiparous daughters of 1965 Norwegian Cattle sires. Initial analyses, using simple random regression models without genetic effects, indicated that use of homogeneous residual variance was appropriate. Further analyses were carried out by use of a repeatability model and 12 random regression sire models. Legendre polynomials of varying order were used to model both permanent environmental and sire effects, as did the Wilmink function, the Lidauer-Mäntysaari function, and the Ali-Schaeffer function. For all these models, heritability estimates were lowest at the beginning (0.05 to 0.07) and higher at the end (0.09 to 0.12) of lactation. Genetic correlations between somatic cell scores early and late in lactation were moderate to high (0.38 to 0.71), whereas genetic correlations for adjacent DIM were near unity. Models were compared based on likelihood ratio tests, Bayesian information criterion, Akaike information criterion, residual variance, and predictive ability. Based on prediction of randomly excluded observations, models with 4 coefficients for permanent environmental effect were preferred over simpler models. More highly parameterized models did not substantially increase predictive ability. Evaluation of the different model selection criteria indicated that a reduced order of fit for sire effects was desireable. Models with zeroth- or first-order of fit for sire effects and higher order of fit for permanent environmental effects probably underestimated sire variance. The chosen model had Legendre polynomials with 3 coefficients for sire, and 4 coefficients for permanent environmental effects. For this model, trajectories of sire variance and heritability were similar assuming either homogeneous or heterogeneous residual variance structure.
Generalized multilevel function-on-scalar regression and principal component analysis.
Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer
2015-06-01
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.
The value of a statistical life: a meta-analysis with a mixed effects regression model.
Bellavance, François; Dionne, Georges; Lebeau, Martin
2009-03-01
The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination
Kamath, Venkatesh Gokuldas
2015-01-01
Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917
A calibration method of Argo floats based on multiple regression analysis
无
2006-01-01
Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
Machine learning of swimming data via wisdom of crowd and regression analysis.
Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing
2017-04-01
Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Pudji Ismartini
2010-08-01
Full Text Available One of the major problem facing the data modelling at social area is multicollinearity. Multicollinearity can have significant impact on the quality and stability of the fitted regression model. Common classical regression technique by using Least Squares estimate is highly sensitive to multicollinearity problem. In such a problem area, Partial Least Squares Regression (PLSR is a useful and flexible tool for statistical model building; however, PLSR can only yields point estimations. This paper will construct the interval estimations for PLSR regression parameters by implementing Jackknife technique to poverty data. A SAS macro programme is developed to obtain the Jackknife interval estimator for PLSR.
Safety Analysis versus Type Inference with Partial Types
Schwartzbach, Michael Ignatieff; Palsberg, Jens
1992-01-01
Safety analysis is an algorithm for determining if a term in an untyped lambda calculus with constants is safe, i.e., if it does not cause an error during evaluation. This ambition is also shared by algorithms for type inference. Safety analysis and type inference are based on rather different...... perspectives, however. Safety analysis is global in that it can only analyze a complete program. In contrast, type inference is local in that it can analyze pieces of a program in isolation. In this paper we prove that safety analysis is sound, relative to both a strict and a lazy operational semantics. We...... also prove that safety analysis accepts strictly more safe lambda terms than does type inference for simple types. The latter result demonstrates that global program analysis can be more precise than local ones....
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping
2012-01-01
Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then
Comparative analysis of regression and artificial neural network models for wind speed prediction
Bilgili, Mehmet; Sahin, Besir
2010-11-01
In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Types and concept analysis for legacy systems
Kuipers, T.; Moonen, L.M.F.
2000-01-01
We combine type inference and concept analysis in order to gain insight into legacy software systems. Type inference for Cobol yields the types for variables and program parameters. These types are used to perform mathematical concept analysis on legacy systems. We have developed ConceptRefinery, a
Effect of acute hypoxia on cognition: A systematic review and meta-regression analysis.
McMorris, Terry; Hale, Beverley J; Barwood, Martin; Costello, Joseph; Corbett, Jo
2017-03-01
A systematic meta-regression analysis of the effects of acute hypoxia on the performance of central executive and non-executive tasks, and the effects of the moderating variables, arterial partial pressure of oxygen (PaO2) and hypobaric versus normobaric hypoxia, was undertaken. Studies were included if they were performed on healthy humans; within-subject design was used; data were reported giving the PaO2 or that allowed the PaO2 to be estimated (e.g. arterial oxygen saturation and/or altitude); and the duration of being in a hypoxic state prior to cognitive testing was ≤6days. Twenty-two experiments met the criteria for inclusion and demonstrated a moderate, negative mean effect size (g=-0.49, 95% CI -0.64 to -0.34, p<0.001). There were no significant differences between central executive and non-executive, perception/attention and short-term memory, tasks. Low (35-60mmHg) PaO2 was the key predictor of cognitive performance (R(2)=0.45, p<0.001) and this was independent of whether the exposure was in hypobaric hypoxic or normobaric hypoxic conditions.
Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio
2016-10-01
We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two (13) C atoms ((13) C2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of (13) C2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% (13) C2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Boldizsar Nagy
2017-05-01
Full Text Available In the present study the biosorption characteristics of Cd (II and Zn (II ions from monocomponent aqueous solutions by Agaricus bisporus macrofungus were investigated. The initial metal ion concentrations, contact time, initial pH and temperature were parameters that influence the biosorption. Maximum removal efficiencies up to 76.10% and 70.09% (318 K for Cd (II and Zn (II, respectively and adsorption capacities up to 3.49 and 2.39 mg/g for Cd (II and Zn (II, respectively at the highest concentration, were calculated. The experimental data were analyzed using pseudo-first- and pseudo-second-order kinetic models, various isotherm models in linear and nonlinear (CMA-ES optimization algorithm regression and thermodynamic parameters were calculated. The results showed that the biosorption process of both studied metal ions, followed pseudo second-order kinetics, while equilibrium is best described by Sips isotherm. The changes in morphological structure after heavy metal-biomass interactions were evaluated by SEM analysis. Our results confirmed that macrofungus A. bisporus could be used as a cost effective, efficient biosorbent for the removal of Cd (II and Zn (II from aqueous synthetic solutions.
A systematic review and meta-regression analysis of mivacurium for tracheal intubation.
Vanlinthout, L E H; Mesfin, S H; Hens, N; Vanacker, B F; Robertson, E N; Booij, L H D J
2014-12-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65-5.73) for doubling the mivacurium dose, 5.99 (2.14-15.18) for adding opioids to the intubation sequence, and 6.55 (6.01-7.74) for increasing the delay between mivacurium injection and airway insertion from 1 to 2 min in subjects aged 25 years and 2.17 (2.01-2.69) for subjects aged 70 years, p < 0.001 for all. We conclude that good conditions for tracheal intubation are more likely by delaying laryngoscopy after injecting a higher dose of mivacurium with an opioid, particularly in older people.
Uchimoto, Takeaki; Iwao, Yasunori; Hattori, Hiroaki; Noguchi, Shuji; Itai, Shigeru
2013-01-01
The interaction of the effects of the triglycerin full behenate (TR-FB) concentration and the mixing time on lubrication and tablet properties were analyzed under a two-factor central composite design, and compared with those of magnesium stearate (Mg-St). Various amounts of lubricant (0.07-3.0%) were added to granules and mixed for 1-30 min. A multiple linear regression analysis was performed to identify the effect of the mixing conditions on each physicochemical property. The mixing conditions did not significantly affect the lubrication properties of TR-FB. For tablet properties, tensile strength decreased and disintegration time increased when the lubricant concentration and the mixing time were increased for Mg-St. The direct interaction of the Mg-St concentration and the mixing time had a significant negative effect on the disintegration time. In contrast, any mixing conditions of TR-FB did not affect the tablet properties. In addition, the range of mixing conditions which satisfied the lubrication and tablet property criteria was broader for TR-FB than that for Mg-St, suggesting that TR-FB allows tablets with high quality attributes to be produced consistently. Therefore, TR-FB is a potential lubricant alternative to Mg-St.
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
2015-11-03
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Comparison of Bayesian and Classical Analysis of Weibull Regression Model: A Simulation Study
İmran KURT ÖMÜRLÜ
2011-01-01
Full Text Available Objective: The purpose of this study was to compare performances of classical Weibull Regression Model (WRM and Bayesian-WRM under varying conditions using Monte Carlo simulations. Material and Methods: It was simulated the generated data by running for each of classical WRM and Bayesian-WRM under varying informative priors and sample sizes using our simulation algorithm. In simulation studies, n=50, 100 and 250 were for sample sizes, and informative prior values using a normal prior distribution with was selected for b1. For each situation, 1000 simulations were performed. Results: Bayesian-WRM with proper informative prior showed a good performance with too little bias. It was found out that bias of Bayesian-WRM increased while priors were becoming distant from reliability in all sample sizes. Furthermore, Bayesian-WRM obtained predictions with more little standard error than the classical WRM in both of small and big samples in the light of proper priors. Conclusion: In this simulation study, Bayesian-WRM showed better performance than classical method, when subjective data analysis performed by considering of expert opinions and historical knowledge about parameters. Consequently, Bayesian-WRM should be preferred in existence of reliable informative priors, in the contrast cases, classical WRM should be preferred.
A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis
TU Jun; LI Yan-ming; LIU Cheng-liang
2009-01-01
Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Magura, Stephen; Cleland, Charles M.; Tonigan, J. Scott
2013-01-01
Objective: The objective of the study is to determine whether Alcoholics Anonymous (AA) participation leads to reduced drinking and problems related to drinking within Project MATCH (Matching Alcoholism Treatments to Client Heterogeneity), an existing national alcoholism treatment data set. Method: The method used is structural equation modeling of panel data with cross-lagged partial regression coefficients. The main advantage of this technique for the analysis of AA outcomes is that potential reciprocal causation between AA participation and drinking behavior can be explicitly modeled through the specification of finite causal lags. Results: For the outpatient subsample (n = 952), the results strongly support the hypothesis that AA attendance leads to increases in alcohol abstinence and reduces drinking/problems, whereas a causal effect in the reverse direction is unsupported. For the aftercare subsample (n = 774), the results are not as clear but also suggest that AA attendance leads to better outcomes. Conclusions: Although randomized controlled trials are the surest means of establishing causal relations between interventions and outcomes, such trials are rare in AA research for practical reasons. The current study successfully exploited the multiple data waves in Project MATCH to examine evidence of causality between AA participation and drinking outcomes. The study obtained unique statistical results supporting the effectiveness of AA primarily in the context of primary outpatient treatment for alcoholism. PMID:23490566
Wensheng Dai
2014-01-01
Full Text Available Sales forecasting is one of the most important issues in managing information technology (IT chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR, is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA, temporal ICA (tICA, and spatiotemporal ICA (stICA to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Sih, A M; Kopp, D M; Tang, J H; Rosenberg, N E; Chipungu, E; Harfouche, M; Moyo, M; Mwale, M; Wilkinson, J P
2016-04-01
To compare primiparous and multiparous women who develop obstetric fistula (OF) and to assess predictors of fistula location. Cross-sectional study. Fistula Care Centre at Bwaila Hospital, Lilongwe, Malawi. Women with OF who presented between September 2011 and July 2014 with a complete obstetric history were eligible for the study. Women with OF were surveyed for their obstetric history. Women were classified as multiparous if prior vaginal or caesarean delivery was reported. The location of the fistula was determined at operation: OF involving the urethra, bladder neck, and midvagina were classified as low; OF involving the vaginal apex, cervix, uterus, and ureters were classified as high. Demographic information was compared between primiparous and multiparous women using chi-squared and Mann-Whitney U-tests. Multivariate logistic regression models were implemented to assess the relationship between variables of interest and fistula location. During the study period, 533 women presented for repair, of which 452 (84.8%) were included in the analysis. The majority (56.6%) were multiparous when the fistula formed. Multiparous women were more likely to have laboured fistula location (37.5 versus 11.2%, P fistula. Multiparity was common in our cohort, and these women were more likely to have a high fistula. Additional research is needed to understand the aetiology of high fistula including potential iatrogenic causes. Multiparity and caesarean delivery were associated with a high tract fistula in our Malawian cohort. © 2016 Royal College of Obstetricians and Gynaecologists.
Sih, Allison M.; Kopp, Dawn M.; Tang, Jennifer H.; Rosenberg, Nora E.; Chipungu, Ennet; Harfouche, Melike; Moyo, Margaret; Mwale, Mwawi; Wilkinson, Jeffrey P.
2016-01-01
Objective To compare primiparous and multiparous women who develop obstetric fistula (OF) and to assess predictors of fistula location Design Cross-sectional study Setting Fistula Care Center at Bwaila Hospital, Lilongwe, Malawi Population Women with OF who presented between September 2011 and July 2014 with a complete obstetric history were eligible for the study. Methods Women with OF were surveyed for their obstetric history. Women were classified as multiparous if prior vaginal or cesarean delivery was reported. Location of fistula was determined at operation. OF involving the urethra, bladder neck, and midvagina were classified as low; OF involving the vaginal apex, cervix, uterus, and ureters were classified as high. Main Outcome Measures Demographic information was compared between primiparous and multiparous women using Chi-squared and Mann-Whitney U tests. Multivariate logistic regression models were implemented to assess the relationship between variables of interest and fistula location. Results During the study period, 533 women presented for repair, of which 452 (84.8%) were included in the analysis. The majority (56.6%) were multiparous when the fistula formed. Multiparous women were more likely to have labored less than a day (62.4% vs 44.5%, pfistula location (37.5% vs 11.2%, pfistula. Conclusions Multiparity was common in our cohort, and these women were more likely to have a high fistula. Additional research is needed to understand the etiology of high fistula including potential iatrogenic causes. PMID:26853525
Margherita Velucchi
2014-09-01
Full Text Available Labor productivity is very complex to analyze across time, sectors and countries. In particular, in Italy, labor productivity has shown a prolonged slowdown but sector analyses highlight the presence of specific niches that have good levels of productivity and performance. This paper investigates how firms' characteristics might have affected the dynamics of the Italian service and manufacturing firms labor productivity in recent years (1998-2007, comparing them and focusing on some relevant sectors. We use a micro level original panel from the Italian National Institute of Statistics (ISTAT and a longitudinal quantile regression approach that allow us to show that labor productivity is highly heterogeneous across sectors and that the links between labor productivity and firms' characteristics are not constant across quantiles. We show that average estimates obtained via GLS do not capture the complex dynamics and heterogeneity of the service and manufacturing firms' labor productivity. Using this approach, we show that innovativeness and human capital, in particular, have a very strong impact on fostering labor productivity of lower productive firms. From the sector analysis on four service' sectors (restaurants & hotels, trade distributors, trade shops and legal & accountants we show that heterogeneity is more intense at a sector level and we derive some common features that may be useful in terms of policy implications.
Juan Merlo
Full Text Available Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR. In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between "specific" (measures of association and "general" (measures of variance contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health.We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC curve for individual-level covariates (i.e., age, sex and individual low income. In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income is interpreted jointly with the proportional change in variance (i.e., PCV and the proportion of ORs in the opposite direction (POOR statistics.For both outcomes, information on individual characteristics (Step 1 provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP. Accounting for neighbourhood of residence (Step 2 only improved the AUC for choosing a private GP (+0.295 units. High neighbourhood income (Step 3 was strongly associated to choosing a private GP (OR = 3.50 but the PCV was only 11% and the POOR 33%.Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible influence on individual use of psychotropic drugs, but
An, Xin; Xu, Shuo; Zhang, Lu-Da; Su, Shi-Guang
2009-01-01
In the present paper, on the basis of LS-SVM algorithm, we built a multiple dependent variables LS-SVM (MLS-SVM) regression model whose weights can be optimized, and gave the corresponding algorithm. Furthermore, we theoretically explained the relationship between MLS-SVM and LS-SVM. Sixty four broomcorn samples were taken as experimental material, and the sample ratio of modeling set to predicting set was 51 : 13. We first selected randomly and uniformly five weight groups in the interval [0, 1], and then in the way of leave-one-out (LOO) rule determined one appropriate weight group and parameters including penalizing parameters and kernel parameters in the model according to the criterion of the minimum of average relative error. Then a multiple dependent variables quantitative analysis model was built with NIR spectrum and simultaneously analyzed three chemical constituents containing protein, lysine and starch. Finally, the average relative errors between actual values and predicted ones by the model of three components for the predicting set were 1.65%, 6.47% and 1.37%, respectively, and the correlation coefficients were 0.9940, 0.8392 and 0.8825, respectively. For comparison, LS-SVM was also utilized, for which the average relative errors were 1.68%, 6.25% and 1.47%, respectively, and the correlation coefficients were 0.9941, 0.8310 and 0.8800, respectively. It is obvious that MLS-SVM algorithm is comparable to LS-SVM algorithm in modeling analysis performance, and both of them can give satisfying results. The result shows that the model with MLS-SVM algorithm is capable of doing multi-components NIR quantitative analysis synchronously. Thus MLS-SVM algorithm offers a new multiple dependent variables quantitative analysis approach for chemometrics. In addition, the weights have certain effect on the prediction performance of the model with MLS-SVM, which is consistent with our intuition and is validated in this study. Therefore, it is necessary to optimize
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Gizaw, Mesgana Seyoum; Gan, Thian Yew
2016-07-01
Regional Flood Frequency Analysis (RFFA) is a statistical method widely used to estimate flood quantiles of catchments with limited streamflow data. In addition, to estimate the flood quantile of ungauged sites, there could be only a limited number of stations with complete dataset are available from hydrologically similar, surrounding catchments. Besides traditional regression based RFFA methods, recent applications of machine learning algorithms such as the artificial neural network (ANN) have shown encouraging results in regional flood quantile estimations. Another novel machine learning technique that is becoming widely applicable in the hydrologic community is the Support Vector Regression (SVR). In this study, an RFFA model based on SVR was developed to estimate regional flood quantiles for two study areas, one with 26 catchments located in southeastern British Columbia (BC) and another with 23 catchments located in southern Ontario (ON), Canada. The SVR-RFFA model for both study sites was developed from 13 sets of physiographic and climatic predictors for the historical period. The Ef (Nash Sutcliffe coefficient) and R2 of the SVR-RFFA model was about 0.7 when estimating flood quantiles of 10, 25, 50 and 100 year return periods which indicate satisfactory model performance in both study areas. In addition, the SVR-RFFA model also performed well based on other goodness-of-fit statistics such as BIAS (mean bias) and BIASr (relative BIAS). If the amount of data available for training RFFA models is limited, the SVR-RFFA model was found to perform better than an ANN based RFFA model, and with significantly lower median CV (coefficient of variation) of the estimated flood quantiles. The SVR-RFFA model was then used to project changes in flood quantiles over the two study areas under the impact of climate change using the RCP4.5 and RCP8.5 climate projections of five Coupled Model Intercomparison Project (CMIP5) GCMs (Global Climate Models) for the 2041
Lançon Christophe
2006-07-01
Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were
Byers, John A
2013-08-01
Dose-response curves of the effects of semiochemicals on neurophysiology and behavior are reported in many articles in insect chemical ecology. Most curves are shown in figures representing points connected by straight lines, in which the x-axis has order of magnitude increases in dosage vs. responses on the y-axis. The lack of regression curves indicates that the nature of the dose-response relationship is not well understood. Thus, a computer model was developed to simulate a flux of various numbers of pheromone molecules (10(3) to 5 × 10(6)) passing by 10(4) receptors distributed among 10(6) positions along an insect antenna. Each receptor was depolarized by at least one strike by a molecule, and subsequent strikes had no additional effect. The simulations showed that with an increase in pheromone release rate, the antennal response would increase in a convex fashion and not in a logarithmic relation as suggested previously. Non-linear regression showed that a family of kinetic formation functions fit the simulated data nearly perfectly (R(2) >0.999). This is reasonable because olfactory receptors have proteins that bind to the pheromone molecule and are expected to exhibit enzyme kinetics. Over 90 dose-response relationships reported in the literature of electroantennographic and behavioral bioassays in the laboratory and field were analyzed by the logarithmic and kinetic formation functions. This analysis showed that in 95% of the cases, the kinetic functions explained the relationships better than the logarithmic (mean of about 20% better). The kinetic curves become sigmoid when graphed on a log scale on the x-axis. Dose-catch relationships in the field are similar to dose-EAR (effective attraction radius, in which a spherical radius indicates the trapping effect of a lure) and the circular EARc in two dimensions used in mass trapping models. The use of kinetic formation functions for dose-response curves of attractants, and kinetic decay curves for
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the
Pradhan, Biswajeet
Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding
The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines
M Kayri
2010-12-01
Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this comparative study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Souadka Amine
2010-04-01
Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.
Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
Dongdong eLin
2014-10-01
Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed
Cecchini Diego M
2009-11-01
Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Nakasone, Yutaka; Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori
2007-01-01
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning.
Stegeman, J.A.; Vernooij, J.C.M.; Khalifa, O.A.; Broek, van den J.; Mevius, D.J.
2006-01-01
In this study, we investigated the change in the resistance of Enterococcus faecium strains isolated from Dutch broilers against erythromycin and virginiamycin in 1998, 1999 and 2001 by logistic regression analysis and survival analysis. The E. faecium strains were isolated from caecal samples that
Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama
Smith, R.L. [University of North Carolina, Chapel Hill, NC (United States). Dept. of Statistics; Davis, J.M. [North Carolina State University, Raleigh, NC (United States). Dept. of Marine, Earth and Atmospheric Sciences; Sacks, J. [National Institute of Statistical Sciences, Research Triangle Park, NC (United States); Speckman, P. [University of Missouri, Columbia, MO (United States). Dept. of Statistics; Styer, P.
2000-11-01
In recent years, a very large literature has built up on the human health effects of air pollution. Many studies have been based on time series analyses in which daily mortality counts, or some other measure such as hospital admissions, have been decomposed through regression analysis into contributions based on long-term trend and seasonality, meteorological effects, and air pollution. There has been a particular focus on particulate air pollution represented by PM{sub 10} (particulate matter of aerodynamic diameter 10 {mu}m or less), though in recent years more attention has been given to very small particles of diameter 2.5 {mu}m or less. Most of the existing data studies, however, are based on PM{sub 10} because of the wide availability of monitoring data for this variable. The persistence of the resulting effects across many different studies is widely cited as evidence that this is not mere statistical association, but indeed establishes a causal relationship. These studies have been cited by the United States Environmental Protection Agency (USEPA) as justification for a tightening on particulate matter standards in the 1997 revision of the National Ambient Air Quality Standard (NAAQS), which is the basis for air pollution regulation in the United States. The purpose of the present paper is to propose a systematic approach to the regression analyses that are central to this kind of research. We argue that the results may depend on a number of ad hoc features of the analysis, including which meteorological variables to adjust for, and the manner in which different lagged values of particulate matter are combined into a single 'exposure measure'. We also examine the question of whether the effects are linear or nonlinear, with particular attention to the possibility of a 'threshold effect', i.e. that significant effects occur only above some threshold. These points are illustrated with a data set from Birmingham, Alabama, first cited by
Cape John
2010-06-01
Full Text Available Abstract Background Psychological therapies provided in primary care are usually briefer than in secondary care. There has been no recent comprehensive review comparing their effectiveness for common mental health problems. We aimed to compare the effectiveness of different types of brief psychological therapy administered within primary care across and between anxiety, depressive and mixed disorders. Methods Meta-analysis and meta-regression of randomized controlled trials of brief psychological therapies of adult patients with anxiety, depression or mixed common mental health problems treated in primary care compared to primary care treatment as usual. Results Thirty-four studies, involving 3962 patients, were included. Most were of brief cognitive behaviour therapy (CBT; n = 13, counselling (n = 8 or problem solving therapy (PST; n = 12. There was differential effectiveness between studies of CBT, with studies of CBT for anxiety disorders having a pooled effect size [d -1.06, 95% confidence interval (CI -1.31 to -0.80] greater than that of studies of CBT for depression (d -0.33, 95% CI -0.60 to -0.06 or studies of CBT for mixed anxiety and depression (d -0.26, 95% CI -0.44 to -0.08. Counselling for depression and mixed anxiety and depression (d -0.32, 95% CI -0.52 to -0.11 and problem solving therapy (PST for depression and mixed anxiety and depression (d -0.21, 95% CI -0.37 to -0.05 were also effective. Controlling for diagnosis, meta-regression found no difference between CBT, counselling and PST. Conclusions Brief CBT, counselling and PST are all effective treatments in primary care, but effect sizes are low compared to longer length treatments. The exception is brief CBT for anxiety, which has comparable effect sizes.
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Liu, Fengyun; Matsuno, Shuji; Malekian, Reza; Yu, Jin; Li, Zhixiong
2016-01-01
.... The above theoretical model is empirically evidenced with VAR (Vector Auto Regression) methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue...
Das Sumonkanti; Rahman Rajwanur M
2011-01-01
Abstract Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0...
Glass, Edmund R; Dozmorov, Mikhail G
2016-10-06
The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions
Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.
Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J
2015-06-01
Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed
Bui The Hung
Full Text Available Evidence-based medicine (EBM has developed as the dominant paradigm of assessment of evidence that is used in clinical practice. Since its development, EBM has been applied to integrate the best available research into diagnosis and treatment with the purpose of improving patient care. In the EBM era, a hierarchy of evidence has been proposed, including various types of research methods, such as meta-analysis (MA, systematic review (SRV, randomized controlled trial (RCT, case report (CR, practice guideline (PGL, and so on. Although there are numerous studies examining the impact and importance of specific cases of EBM in clinical practice, there is a lack of research quantitatively measuring publication trends in the growth and development of EBM. Therefore, a bibliometric analysis was constructed to determine the scientific productivity of EBM research over decades.NCBI PubMed database was used to search, retrieve and classify publications according to research method and year of publication. Joinpoint regression analysis was undertaken to analyze trends in research productivity and the prevalence of individual research methods.Analysis indicates that MA and SRV, which are classified as the highest ranking of evidence in the EBM, accounted for a relatively small but auspicious number of publications. For most research methods, the annual percent change (APC indicates a consistent increase in publication frequency. MA, SRV and RCT show the highest rate of publication growth in the past twenty years. Only controlled clinical trials (CCT shows a non-significant reduction in publications over the past ten years.Higher quality research methods, such as MA, SRV and RCT, are showing continuous publication growth, which suggests an acknowledgement of the value of these methods. This study provides the first quantitative assessment of research method publication trends in EBM.
Expert Involvement Predicts mHealth App Downloads: Multivariate Regression Analysis of Urology Apps
Osório, Luís; Cavadas, Vitor; Fraga, Avelino; Carrasquinho, Eduardo; Cardoso de Oliveira, Eduardo; Castelo-Branco, Miguel; Roobol, Monique J
2016-01-01
Background Urological mobile medical (mHealth) apps are gaining popularity with both clinicians and patients. mHealth is a rapidly evolving and heterogeneous field, with some urology apps being downloaded over 10,000 times and others not at all. The factors that contribute to medical app downloads have yet to be identified, including the hypothetical influence of expert involvement in app development. Objective The objective of our study was to identify predictors of the number of urology app downloads. Methods We reviewed urology apps available in the Google Play Store and collected publicly available data. Multivariate ordinal logistic regression evaluated the effect of publicly available app variables on the number of apps being downloaded. Results Of 129 urology apps eligible for study, only 2 (1.6%) had >10,000 downloads, with half having ≤100 downloads and 4 (3.1%) having none at all. Apps developed with expert urologist involvement (P=.003), optional in-app purchases (P=.01), higher user rating (P<.001), and more user reviews (P<.001) were more likely to be installed. App cost was inversely related to the number of downloads (P<.001). Only data from the Google Play Store and the developers’ websites, but not other platforms, were publicly available for analysis, and the level and nature of expert involvement was not documented. Conclusions The explicit participation of urologists in app development is likely to enhance its chances to have a higher number of downloads. This finding should help in the design of better apps and further promote urologist involvement in mHealth. Official certification processes are required to ensure app quality and user safety. PMID:27421338
Witt, Katrina; van Dorn, Richard; Fazel, Seena
2013-01-01
Previous reviews on risk and protective factors for violence in psychosis have produced contrasting findings. There is therefore a need to clarify the direction and strength of association of risk and protective factors for violent outcomes in individuals with psychosis. We conducted a systematic review and meta-analysis using 6 electronic databases (CINAHL, EBSCO, EMBASE, Global Health, PsycINFO, PUBMED) and Google Scholar. Studies were identified that reported factors associated with violence in adults diagnosed, using DSM or ICD criteria, with schizophrenia and other psychoses. We considered non-English language studies and dissertations. Risk and protective factors were meta-analysed if reported in three or more primary studies. Meta-regression examined sources of heterogeneity. A novel meta-epidemiological approach was used to group similar risk factors into one of 10 domains. Sub-group analyses were then used to investigate whether risk domains differed for studies reporting severe violence (rather than aggression or hostility) and studies based in inpatient (rather than outpatient) settings. There were 110 eligible studies reporting on 45,533 individuals, 8,439 (18.5%) of whom were violent. A total of 39,995 (87.8%) were diagnosed with schizophrenia, 209 (0.4%) were diagnosed with bipolar disorder, and 5,329 (11.8%) were diagnosed with other psychoses. Dynamic (or modifiable) risk factors included hostile behaviour, recent drug misuse, non-adherence with psychological therapies (p valuesviolence, these associations did not change materially. In studies investigating inpatient violence, associations differed in strength but not direction. Certain dynamic risk factors are strongly associated with increased violence risk in individuals with psychosis and their role in risk assessment and management warrants further examination.
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Fushimi, Akihiro; Kawashima, Hiroto; Kajihara, Hideo
Understanding the contribution of each emission source of air pollutants to ambient concentrations is important to establish effective measures for risk reduction. We have developed a source apportionment method based on an atmospheric dispersion model and multiple linear regression analysis (MLR) in conjunction with ambient concentrations simultaneously measured at points in a grid network. We used a Gaussian plume dispersion model developed by the US Environmental Protection Agency called the Industrial Source Complex model (ISC) in the method. Our method does not require emission amounts or source profiles. The method was applied to the case of benzene in the vicinity of the Keiyo Central Coastal Industrial Complex (KCCIC), one of the biggest industrial complexes in Japan. Benzene concentrations were simultaneously measured from December 2001 to July 2002 at sites in a grid network established in the KCCIC and the surrounding residential area. The method was used to estimate benzene emissions from the factories in the KCCIC and from automobiles along a section of a road, and then the annual average contribution of the KCCIC to the ambient concentrations was estimated based on the estimated emissions. The estimated contributions of the KCCIC were 65% inside the complex, 49% at 0.5-km sites, 35% at 1.5-km sites, 20% at 3.3-km sites, and 9% at a 5.6-km site. The estimated concentrations agreed well with the measured values. The estimated emissions from the factories and the road were slightly larger than those reported in the first Pollutant Release and Transfer Register (PRTR). These results support the reliability of our method. This method can be applied to other chemicals or regions to achieve reasonable source apportionments.
Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije
2014-03-01
The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.
Regression analysis of time trends in perinatal mortality in Germany 1980-1993.
Scherb, H; Weigelt, E; Brüske-Hohlfeld, I
2000-02-01
Numerous investigations have been carried out on the possible impact of the Chernobyl accident on the prevalence of anomalies at birth and on perinatal mortality. In many cases the studies were aimed at the detection of differences of pregnancy outcome measurements between regions or time periods. Most authors conclude that there is no evidence of a detrimental physical effect on congenital anomalies or other outcomes of pregnancy following the accident. In this paper, we report on statistical analyses of time trends of perinatal mortality in Germany. Our main intention is to investigate whether perinatal mortality, as reflected in official records, was increased in 1987 as a possible effect of the Chernobyl accident. We show that, in Germany as a whole, there was a significantly elevated perinatal mortality proportion in 1987 as compared to the trend function. The increase is 4.8% (p = 0.0046) of the expected perinatal death proportion for 1987. Even more pronounced levels of 8.2% (p = 0. 0458) and 8.5% (p = 0.0702) may be found in the higher contaminated areas of the former German Democratic Republic (GDR), including West Berlin, and of Bavaria, respectively. To investigate the impact of statistical models on results, we applied three standard regression techniques. The observed significant increase in 1987 is independent of the statistical model used. Stillbirth proportions show essentially the same behavior as perinatal death proportions, but the results for all of Germany are nonsignificant due to the smaller numbers involved. Analysis of the association of stillbirth proportions with the (137)Cs deposition on a district level in Bavaria discloses a significant relationship. Our results are in contrast to those of many analyses of the health consequences of the Chernobyl accident and contradict the present radiobiologic knowledge. As we are dealing with highly aggregated data, other causes or artifacts may explain the observed effects. Hence, the findings
Flexible survival regression modelling
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Zhang, Man; Liu, Xu-Hua; He, Xiong-Kui; Zhang, Lu-Da; Zhao, Long-Lian; Li, Jun-Hui
2010-05-01
In the present paper, taking 66 wheat samples for testing materials, ridge regression technology in near-infrared (NIR) spectroscopy quantitative analysis was researched. The NIR-ridge regression model for determination of protein content was established by NIR spectral data of 44 wheat samples to predict the protein content of the other 22 samples. The average relative error was 0.015 18 between the predictive results and Kjeldahl's values (chemical analysis values). And the predictive results were compared with those values derived through partial least squares (PLS) method, showing that ridge regression method was deserved to be chosen for NIR spectroscopy quantitative analysis. Furthermore, in order to reduce the disturbance to predictive capacity of the quantitative analysis model resulting from irrelevant information, one effective way is to screen the wavelength information. In order to select the spectral information with more content information and stronger relativity with the composition or the nature of the samples to improve the model's predictive accuracy, ridge regression was used to select wavelength information in this paper. The NIR-ridge regression model was established with the spectral information at 4 wavelength points, which were selected from 1 297 wavelength points, to predict the protein content of the 22 samples. The average relative error was 0.013 7 and the correlation coefficient reached 0.981 7 between the predictive results and Kjeldahl's values. The results showed that ridge regression was able to screen the essential wavelength information from a large amount of spectral information. It not only can simplify the model and effectively reduce the disturbance resulting from collinearity information, but also has practical significance for designing special NIR analysis instrument for analyzing specific component in some special samples.
Significant drivers of the virtual water trade evaluated with a multivariate regression analysis
Tamea, Stefania; Laio, Francesco; Ridolfi, Luca
2014-05-01
International trade of food is vital for the food security of many countries, which rely on trade to compensate for an agricultural production insufficient to feed the population. At the same time, food trade has implications on the distribution and use of water resources, because through the international trade of food commodities, countries virtually displace the water used for food production, known as "virtual water". Trade thus implies a network of virtual water fluxes from exporting to importing countries, which has been estimated to displace more than 2 billions of m3 of water per year, or about the 2% of the annual global precipitation above land. It is thus important to adequately identify the dynamics and the controlling factors of the virtual water trade in that it supports and enables the world food security. Using the FAOSTAT database of international trade and the virtual water content available from the Water Footprint Network, we reconstructed 25 years (1986-2010) of virtual water fluxes. We then analyzed the dependence of exchanged fluxes on a set of major relevant factors, that includes: population, gross domestic product, arable land, virtual water embedded in agricultural production and dietary consumption, and geographical distance between countries. Significant drivers have been identified by means of a multivariate regression analysis, applied separately to the export and import fluxes of each country; temporal trends are outlined and the relative importance of drivers is assessed by a commonality analysis. Results indicate that population, gross domestic product and geographical distance are the major drivers of virtual water fluxes, with a minor (but non-negligible) contribution given by the agricultural production of exporting countries. Such drivers have become relevant for an increasing number of countries throughout the years, with an increasing variance explained by the distance between countries and a decreasing role of the gross
Lamichhane, Archana P.; Liese, Angela D.; Urbina, Elaine M.; Crandell, Jamie L.; Jaacks, Lindsay M.; Dabelea, Dana; Black, Mary Helen; Merchant, Anwar T.; Mayer-Davis, Elizabeth J.
2014-01-01
BACKGROUND/OBJECTIVES Youth with type 1 diabetes (T1DM) are at substantially increased risk for adverse vascular outcomes, but little is known about the influence of dietary behavior on cardiovascular disease (CVD) risk profile. We aimed to identify dietary intake patterns associated with CVD risk factors and evaluate their impact on arterial stiffness (AS) measures collected thereafter in a cohort of youth with T1DM. SUBJECTS/METHODS Baseline diet data from a food frequency questionnaire and CVD risk factors (triglycerides, LDL-cholesterol, systolic BP, HbA1c, C-reactive protein and waist circumference) were available for 1,153 youth aged ≥10 years with T1DM from the SEARCH for Diabetes in Youth Study. A dietary intake pattern was identified using 33 food-groups as predictors and six CVD risk factors as responses in reduced rank regression (RRR) analysis. Associations of this RRR-derived dietary pattern with AS measures [augmentation index(AIx75), n=229; pulse wave velocity(PWV), n=237; and brachial distensibility(BrachD), n=228] were then assessed using linear regression. RESULTS The RRR-derived pattern was characterized by high intakes of sugar-sweetened beverages (SSB) and diet soda, eggs, potatoes and high-fat meats, and low intakes of sweets/desserts and low-fat dairy; major contributors were SSB and diet soda. This pattern captured the largest variability in adverse CVD risk profile and was subsequently associated with AIx75 (β=0.47; p<0.01). The mean difference in AIx75 concentration between the highest and the lowest dietary pattern quartiles was 4.3% in fully adjusted model. CONCLUSIONS Intervention strategies to reduce consumption of unhealthful foods and beverages among youth with T1DM may significantly improve CVD risk profile and ultimately reduce the risk for AS. PMID:24865480
Sjolie, A.K.; Klein, R.; Porta, M.;
2008-01-01
BACKGROUND: Diabetic retinopathy remains a leading cause of visual loss in people of working age. We examined whether candesartan treatment could slow the progression and, secondly, induce regression of retinopathy in people with type 2 diabetes. METHODS: We did a randomised, double-blind, parall...
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27
Dagmar Blatná
2014-06-01
Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.
Kamruzzaman, Md; Mamun, A S M A; Bakar, Sheikh Muhammad Abu; Saw, Aik; Kamarul, T; Islam, Md Nurul; Hossain, Md Golam
2016-11-21
The aim of this study was to investigate the socioeconomic and demographic factors influencing the body mass index (BMI) of non-pregnant married Bangladeshi women of reproductive age. Secondary (Hierarchy) data from the 2011 Bangladesh Demographic and Health Survey, collected using two-stage stratified cluster sampling, were used. Two-level linear regression analysis was performed to remove the cluster effect of the variables. The mean BMI of married non-pregnant Bangladeshi women was 21.60±3.86 kg/m2, and the prevalence of underweight, overweight and obesity was 22.8%, 14.9% and 3.2%, respectively. After removing the cluster effect, age and age at first marriage were found to be positively (pchildren was negatively related with women's BMI. Lower BMI was especially found among women from rural areas and poor families, with an uneducated husband, with no television at home and who were currently breast-feeding. Age, total children ever born, age at first marriage, type of residence, education level, level of husband's education, wealth index, having a television at home and practising breast-feeding were found to be important predictors for the BMI of married Bangladeshi non-pregnant women of reproductive age. This information could be used to identify sections of the Bangladeshi population that require special attention, and to develop more effective strategies to resolve the problem of malnutrition.
Spatial Quantile Regression In Analysis Of Healthy Life Years In The European Union Countries
Trzpiot Grażyna
2016-12-01
Full Text Available The paper investigates the impact of the selected factors on the healthy life years of men and women in the EU countries. The multiple quantile spatial autoregression models are used in order to account for substantial differences in the healthy life years and life quality across the EU members. Quantile regression allows studying dependencies between variables in different quantiles of the response distribution. Moreover, this statistical tool is robust against violations of the classical regression assumption about the distribution of the error term. Parameters of the models were estimated using instrumental variable method (Kim, Muller 2004, whereas the confidence intervals and p-values were bootstrapped.
JT-60 configuration parameters for feedback control determined by regression analysis
Matsukawa, Makoto; Hosogane, Nobuyuki; Ninomiya, Hiromasa
1991-12-01
The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60.
Katrina Witt
Full Text Available BACKGROUND: Previous reviews on risk and protective factors for violence in psychosis have produced contrasting findings. There is therefore a need to clarify the direction and strength of association of risk and protective factors for violent outcomes in individuals with psychosis. METHOD: We conducted a systematic review and meta-analysis using 6 electronic databases (CINAHL, EBSCO, EMBASE, Global Health, PsycINFO, PUBMED and Google Scholar. Studies were identified that reported factors associated with violence in adults diagnosed, using DSM or ICD criteria, with schizophrenia and other psychoses. We considered non-English language studies and dissertations. Risk and protective factors were meta-analysed if reported in three or more primary studies. Meta-regression examined sources of heterogeneity. A novel meta-epidemiological approach was used to group similar risk factors into one of 10 domains. Sub-group analyses were then used to investigate whether risk domains differed for studies reporting severe violence (rather than aggression or hostility and studies based in inpatient (rather than outpatient settings. FINDINGS: There were 110 eligible studies reporting on 45,533 individuals, 8,439 (18.5% of whom were violent. A total of 39,995 (87.8% were diagnosed with schizophrenia, 209 (0.4% were diagnosed with bipolar disorder, and 5,329 (11.8% were diagnosed with other psychoses. Dynamic (or modifiable risk factors included hostile behaviour, recent drug misuse, non-adherence with psychological therapies (p values<0.001, higher poor impulse control scores, recent substance misuse, recent alcohol misuse (p values<0.01, and non-adherence with medication (p value <0.05. We also examined a number of static factors, the strongest of which were criminal history factors. When restricting outcomes to severe violence, these associations did not change materially. In studies investigating inpatient violence, associations differed in strength but not
Robinson, Jo; Spittal, Matthew J; Carter, Greg
2016-01-01
Objective To examine the efficacy of psychological and psychosocial interventions for reductions in repeated self-harm. Design We conducted a systematic review, meta-analysis and meta-regression to examine the efficacy of psychological and psychosocial interventions to reduce repeat self-harm in adults. We included a sensitivity analysis of studies with a low risk of bias for the meta-analysis. For the meta-regression, we examined whether the type, intensity (primary analyses) and other components of intervention or methodology (secondary analyses) modified the overall intervention effect. Data sources A comprehensive search of MEDLINE, PsycInfo and EMBASE (from 1999 to June 2016) was performed. Eligibility criteria for selecting studies Randomised controlled trials of psychological and psychosocial interventions for adult self-harm patients. Results Forty-five trials were included with data available from 36 (7354 participants) for the primary analysis. Meta-analysis showed a significant benefit of all psychological and psychosocial interventions combined (risk ratio 0.84; 95% CI 0.74 to 0.96; number needed to treat=33); however, sensitivity analyses showed that this benefit was non-significant when restricted to a limited number of high-quality studies. Meta-regression showed that the type of intervention did not modify the treatment effects. Conclusions Consideration of a psychological or psychosocial intervention over and above treatment as usual is worthwhile; with the public health benefits of ensuring that this practice is widely adopted potentially worth the investment. However, the specific type and nature of the intervention that should be delivered is not yet clear. Cognitive–behavioural therapy or interventions with an interpersonal focus and targeted on the precipitants to self-harm may be the best candidates on the current evidence. Further research is required. PMID:27660314
Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.
2015-01-01
OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe
Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong
2015-01-01
Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…
Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.
2015-01-01
OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe
Swets, Marije; Dekker, Jack; van Emmerik-van Oortmerssen, Katelijne; Smid, Geert E.; Smit, Filip; de Haan, Lieuwe; Schoevers, Robert A.
Aims: The aims of this study were to conduct a meta-analysis and meta-regression to estimate the prevalence rates for obsessive compulsive symptoms (OCS) and obsessive compulsive disorder (OCD) in schizophrenia, and to investigate what influences these prevalence rates. Method: Studies were
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Li, T.; Sun, L.; Zou, L.
2009-01-01
This study assesses the impact of government shareholding on corporate performance using a sample of 643 non-financial companies listed on the Chinese stock exchanges. In view of the controversial empirical findings in the literature and the limitations of the least squares regressions, we adopt the
Genetic analysis of tolerance to infections using random regressions: a simulation study
Kause, A.
2011-01-01
Tolerance to infections is the ability of a host to limit the impact of a given pathogen burden on host performance. This simulation study demonstrated the merit of using random regressions to estimate unbiased genetic variances for tolerance slope and its genetic correlations with other traits,
The Analysis of Nonstationary Time Series Using Regression, Correlation and Cointegration
Johansen, Søren
2012-01-01
There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally we...
An Analysis of the Indicator Saturation Estimator as a Robust Regression Estimator
Johansen, Søren; Nielsen, Bent
An algorithm suggested by Hendry (1999) for estimation in a regression with more regressors than observations, is analyzed with the purpose of finding an estimator that is robust to outliers and structural breaks. This estimator is an example of a one-step M-estimator based on Huber's skip functi...
Schlechtingen, Meik; Santos, Ilmar
2011-01-01
This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal ...
Hoeflinger, Jennifer L; Hoeflinger, Daniel E; Miller, Michael J
2017-01-01
Herein, an open-source method to generate quantitative bacterial growth data from high-throughput microplate assays is described. The bacterial lag time, maximum specific growth rate, doubling time and delta OD are reported. Our method was validated by carbohydrate utilization of lactobacilli, and visual inspection revealed 94% of regressions were deemed excellent.
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers, P.H.C.; Roder, E.; Savelkoul, H.F.J.; Wijk, van R.G.
2012-01-01
Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techni
Quantile regression for the statistical analysis of immunological data with many non-detects
P.H.C. Eilers (Paul); E. Röder (Esther); H.F.J. Savelkoul (Huub); R. Gerth van Wijk (Roy)
2012-01-01
textabstractBackground: Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced stati
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Wiley, Kristofor R.
2013-01-01
Many of the social and emotional needs that have historically been associated with gifted students have been questioned on the basis of recent empirical evidence. Research on the topic, however, is often limited by sample size, selection bias, or definition. This study addressed these limitations by applying linear regression methodology to data…
Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar
2010-01-01
Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…
Panagiotis Kasteridis
Full Text Available To test the impact of a UK pay-for-performance indicator, the Quality and Outcomes Framework (QOF dementia review, on three types of hospital admission for people with dementia: emergency admissions where dementia was the primary diagnosis; emergency admissions for ambulatory care sensitive conditions (ACSCs; and elective admissions for cataract, hip replacement, hernia, prostate disease, or hearing loss.Count data regression analyses of hospital admissions from 8,304 English general practices from 2006/7 to 2010/11. We identified relevant admissions from national Hospital Episode Statistics and aggregated them to practice level. We merged these with practice-level data on the QOF dementia review. In the base case, the exposure measure was the reported QOF register. As dementia is commonly under-diagnosed, we tested a predicted practice register based on consensus estimates. We adjusted for practice characteristics including measures of deprivation and uptake of a social benefit to purchase care services (Attendance Allowance.In the base case analysis, higher QOF achievement had no significant effect on any type of hospital admission. However, when the predicted register was used to account for under-diagnosis, a one-percentage point improvement in QOF achievement was associated with a small reduction in emergency admissions for both dementia (-0.1%; P=0.011 and ACSCs (-0.1%; P=0.001. In areas of greater deprivation, uptake of Attendance Allowance was consistently associated with significantly lower emergency admissions. In all analyses, practices with a higher proportion of nursing home patients had significantly lower admission rates for elective and emergency care.In one of three analyses at practice level, the QOF review for dementia was associated with a small but significant reduction in unplanned hospital admissions. Given the rising prevalence of dementia, increasing pressures on acute hospital beds and poor outcomes associated with
Knox, Matthew C; Edye, Michael
2016-04-01
Surgical antibiotic prophylaxis is frequently reported in the literature to be suboptimal, a finding having both clinical and public health implications. This study aimed to calculate rates and patterns of adherence to guidelines at two sites and identify extrinsic contributing factors. A retrospective analysis was conducted over two 12-mo periods during 2013-2014 at the metropolitan Blacktown Hospital and regional Lismore Base Hospital, New South Wales, Australia. A group of 400 patients undergoing abdominal general surgery was selected via simple random sampling (n = 200 per site). Medical records were reviewed, and prophylactic antibiotic regimens were compared with the Australian guideline, Therapeutic Guidelines: Antibiotic (v. 14) with respect to drug choice, dosage, timing of administration, and duration of administration. The overall rate of adherence to the guidelines was 16.5% at Blacktown Hospital and 19.5% at Lismore Base Hospital. At each site, prophylaxis was administered to more than 95% of patients and was inappropriately withheld in 4%. Drug choice was the most frequent error type, specifically involving inappropriate omission of metronidazole and use of newer-generation cephalosporins. Errors in the timing of administration also were frequent, with prophylaxis typically occurring excessively early. Logistic regression identified emergency surgery as independently associated with prophylactic errors in both the Blacktown Hospital (p antibiotic prophylactic guidelines was poor at both the metropolitan and regional sites. Choice of antibiotic and timing of administration were identified as major error types. Consideration should be given to multidisciplinary involvement of anesthetists, implementation of focused interventions with an emphasis on emergency settings, and further research correlating antibiotic use with clinical significance.
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
Kiss, I.
2012-04-01
Full Text Available The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Rolling Regressions with Stata
Kit Baum
2004-01-01
This talk will describe some work underway to add a "rolling regression" capability to Stata's suite of time series features. Although commands such as "statsby" permit analysis of non-overlapping subsamples in the time domain, they are not suited to the analysis of overlapping (e.g. "moving window") samples. Both moving-window and widening-window techniques are often used to judge the stability of time series regression relationships. We will present an implementation of a rolling regression...
Veilleux, Andrea G.; Stedinger, Jery R.; Eash, David A.
2012-01-01
This paper summarizes methodological advances in regional log-space skewness analyses that support flood-frequency analysis with the log Pearson Type III (LP3) distribution. A Bayesian Weighted Least Squares/Generalized Least Squares (B-WLS/B-GLS) methodology that relates observed skewness coefficient estimators to basin characteristics in conjunction with diagnostic statistics represents an extension of the previously developed B-GLS methodology. B-WLS/B-GLS has been shown to be effective in two California studies. B-WLS/B-GLS uses B-WLS to generate stable estimators of model parameters and B-GLS to estimate the precision of those B-WLS regression parameters, as well as the precision of the model. The study described here employs this methodology to develop a regional skewness model for the State of Iowa. To provide cost effective peak-flow data for smaller drainage basins in Iowa, the U.S. Geological Survey operates a large network of crest stage gages (CSGs) that only record flow values above an identified recording threshold (thus producing a censored data record). CSGs are different from continuous-record gages, which record almost all flow values and have been used in previous B-GLS and B-WLS/B-GLS regional skewness studies. The complexity of analyzing a large CSG network is addressed by using the B-WLS/B-GLS framework along with the Expected Moments Algorithm (EMA). Because EMA allows for the censoring of low outliers, as well as the use of estimated interval discharges for missing, censored, and historic data, it complicates the calculations of effective record length (and effective concurrent record length) used to describe the precision of sample estimators because the peak discharges are no longer solely represented by single values. Thus new record length calculations were developed. The regional skewness analysis for the State of Iowa illustrates the value of the new B-WLS/BGLS methodology with these new extensions.
Regional variation in the prevalence of E. coli O157 in cattle: a meta-analysis and meta-regression.
Md Zohorul Islam
Full Text Available Escherichia coli O157 (EcO157 infection has been recognized as an important global public health concern. But information on the prevalence of EcO157 in cattle at the global and at the wider geographical levels is limited, if not absent. This is the first meta-analysis to investigate the point prevalence of EcO157 in cattle at the global level and to explore the factors contributing to variation in prevalence estimates.Seven electronic databases- CAB Abstracts, PubMed, Biosis Citation Index, Medline, Web of Knowledge, Scirus and Scopus were searched for relevant publications from 1980 to 2012. A random effect meta-analysis model was used to produce the pooled estimates. The potential sources of between study heterogeneity were identified using meta-regression.A total of 140 studies consisting 220,427 cattle were included in the meta-analysis. The prevalence estimate of EcO157 in cattle at the global level was 5.68% (95% CI, 5.16-6.20. The random effects pooled prevalence estimates in Africa, Northern America, Oceania, Europe, Asia and Latin America-Caribbean were 31.20% (95% CI, 12.35-50.04, 7.35% (95% CI, 6.44-8.26, 6.85% (95% CI, 2.41-11.29, 5.15% (95% CI, 4.21-6.09, 4.69% (95% CI, 3.05-6.33 and 1.65% (95% CI, 0.77-2.53, respectively. Between studies heterogeneity was evidenced in most regions. World region (p<0.001, type of cattle (p<0.001 and to some extent, specimens (p = 0.074 as well as method of pre-enrichment (p = 0.110, were identified as factors for variation in the prevalence estimates of EcO157 in cattle.The prevalence of the organism seems to be higher in the African and Northern American regions. The important factors that might have influence in the estimates of EcO157 are type of cattle and kind of screening specimen. Their roles need to be determined and they should be properly handled in any survey to estimate the true prevalence of EcO157.
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p organic matter R(2) 0.98 (p organic matter for upper soil horizons in a nondestructive method.
Das Sumonkanti
2011-11-01
Full Text Available Abstract Background The study attempts to develop an ordinal logistic regression (OLR model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score child nutrition status is categorized into three groups-severely undernourished ( Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM are appropriate to find predictors of malnutrition instead of BLR models.
Zhang, Kun; Huang, Feifei; Chen, Jie; Cai, Qingqing; Wang, Tong; Zou, Rong; Zuo, Zhiyi; Wang, Jingfeng; Huang, Hui
2014-11-01
Overweight and obesity are associated with adverse cardiovascular outcomes. However, the role of overweight and obesity in left ventricular hypertrophy (LVH) of hypertensive patients is controversial. The aim of the current meta-analysis was to evaluate the influence of overweight and obesity on LVH regression in the hypertensive population.Twenty-eight randomized controlled trials comprising 2403 hypertensive patients (mean age range: 43.8-66.7 years) were identified. Three groups were divided according to body mass index: normal weight, overweight, and obesity groups.Compared with the normal-weight group, LVH regression in the overweight and obesity groups was more obvious with less reduction of systolic blood pressure after antihypertensive therapies (P regressing LVH in overweight and obese hypertensive patients (19.27 g/m, 95% confidence interval [15.25, 23.29], P regression was found in 24-h ambulatory blood pressure monitoring (ABPM) group and in relatively young patients (40-60 years' old) group (P Overweight and obesity are independent risk factors for LVH in hypertensive patients. Intervention at an early age and monitoring by ABPM may facilitate therapy-induced LVH regression in overweight and obese hypertensive patients.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification.
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Fengfeng Wang; S. C. Cesar Wong; Lawrence W. C. Chan; Cho, William C. S.; S. P. Yip; Yung, Benjamin Y. M.
2014-01-01
Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopt...
Javali Shivalingappa; Pandit Parameshwar
2010-01-01
Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity) by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodon...
Unification of regression-based methods for the analysis of natural selection.
Morrissey, Michael B; Sakrejda, Krzysztof
2013-07-01
Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares-based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.
SHAO, Xueguang; CHEN, Da; XU, Heng; LIU, Zhichao; CAI, Wensheng
2009-01-01
Partial least-squares (PLS) regression has been presented as a powerful tool for spectral quantitative measure- ment. However, the improvement of the robustness and stability of PLS models is still needed, because it is difficult to build a stable model when complex samples are analyzed or outliers are contained in the calibration data set. To achieve the purpose, a robust ensemble PLS technique based on probability resampling was proposed, which is named RE-PLS. In the proposed method, a probability is firstly obtained for each calibration sample from its resid- ual in a robust regression. Then, multiple PLS models are constructed based on probability resampling. At last, the multiple PLS models are used to predict unknown samples by taking the average of the predictions from the multi- ple models as final prediction result. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. The results show that RE-PLS can not only effectively avoid the inter- ference of outliers but also enhance the precision of prediction and the stability of PLS regression. Thus, it may pro- vide a useful tool for multivariate calibration with multiple outliers.
Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Dyar, Melinda D [MT HOLYOKE COLLEGE; Schafer, Martha W [LSU; Tucker, Jonathan M [MT HOLYOKE COLLEGE
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
Julio Cesar de Oliveira
2014-04-01
Full Text Available MODerate resolution Imaging Spectroradiometer (MODIS data are largely used in multitemporal analysis of various Earth-related phenomena, such as vegetation phenology, land use/land cover change, deforestation monitoring, and time series analysis. In general, the MODIS products used to undertake multitemporal analysis are composite mosaics of the best pixels over a certain period of time. However, it is common to find bad pixels in the composition that affect the time series analysis. We present a filtering methodology that considers the pixel position (location in space and time (position in the temporal data series to define a new value for the bad pixel. This methodology, called Window Regression (WR, estimates the value of the point of interest, based on the regression analysis of the data selected by a spatial-temporal window. The spatial window is represented by eight pixels neighboring the pixel under evaluation, and the temporal window selects a set of dates close to the date of interest (either earlier or later. Intensities of noises were simulated over time and space, using the MOD13Q1 product. The method presented and other techniques (4253H twice, Mean Value Iteration (MVI and Savitzky–Golay were evaluated using the Mean Absolute Percentage Error (MAPE and Akaike Information Criteria (AIC. The tests revealed the consistently superior performance of the Window Regression approach to estimate new Normalized Difference Vegetation Index (NDVI values irrespective of the intensity of the noise simulated.
Frndak, Seth E; Smerbeck, Audrey M; Irwin, Lauren N; Drake, Allison S; Kordovski, Victoria M; Kunker, Katrina A; Khan, Anjum L; Benedict, Ralph H B
2016-10-01
We endeavored to clarify how distinct co-occurring symptoms relate to the presence of negative work events in employed multiple sclerosis (MS) patients. Latent profile analysis (LPA) was utilized to elucidate common disability patterns by isolating patient subpopulations. Samples of 272 employed MS patients and 209 healthy controls (HC) were administered neuroperformance tests of ambulation, hand dexterity, processing speed, and memory. Regression-based norms were created from the HC sample. LPA identified latent profiles using the regression-based z-scores. Finally, multinomial logistic regression tested for negative work event differences among the latent profiles. Four profiles were identified via LPA: a common profile (55%) characterized by slightly below average performance in all domains, a broadly low-performing profile (18%), a poor motor abilities profile with average cognition (17%), and a generally high-functioning profile (9%). Multinomial regression analysis revealed that the uniformly low-performing profile demonstrated a higher likelihood of reported negative work events. Employed MS patients with co-occurring motor, memory and processing speed impairments were most likely to report a negative work event, classifying them as uniquely at risk for job loss.
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Analysis of extreme drinking in patients with alcohol dependence using Pareto regression.
Das, Sourish; Harel, Ofer; Dey, Dipak K; Covault, Jonathan; Kranzler, Henry R
2010-05-20
We developed a novel Pareto regression model with an unknown shape parameter to analyze extreme drinking in patients with Alcohol Dependence (AD). We used the generalized linear model (GLM) framework and the log-link to include the covariate information through the scale parameter of the generalized Pareto distribution. We proposed a Bayesian method based on Ridge prior and Zellner's g-prior for the regression coefficients. Simulation study indicated that the proposed Bayesian method performs better than the existing likelihood-based inference for the Pareto regression.We examined two issues of importance in the study of AD. First, we tested whether a single nucleotide polymorphism within GABRA2 gene, which encodes a subunit of the GABA(A) receptor, and that has been associated with AD, influences 'extreme' alcohol intake and second, the efficacy of three psychotherapies for alcoholism in treating extreme drinking behavior. We found an association between extreme drinking behavior and GABRA2. We also found that, at baseline, men with a high-risk GABRA2 allele had a significantly higher probability of extreme drinking than men with no high-risk allele. However, men with a high-risk allele responded to the therapy better than those with two copies of the low-risk allele. Women with high-risk alleles also responded to the therapy better than those with two copies of the low-risk allele, while women who received the cognitive behavioral therapy had better outcomes than those receiving either of the other two therapies. Among men, motivational enhancement therapy was the best for the treatment of the extreme drinking behavior.
L. Monika Moskal
2012-08-01
Full Text Available The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R^{2} 0.91 (p < 0.01 at 403, 470, 687, and 846 nm spectral band widths, carbonate R^{2} 0.95 (p < 0.01 at 531 and 898 nm band widths, total carbon R^{2} 0.93 (p < 0.01 at 400, 409, 441 and 907 nm band widths, and organic matter R^{2} 0.98 (p < 0.01 at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method.
Broe, Rebecca; Rasmussen, Malin Lundberg; Frydkjaer-Olsen, Ulrik;
2014-01-01
The aim was to investigate the long-term incidence of proliferative diabetic retinopathy (PDR), and progression and regression of diabetic retinopathy (DR) and associated risk factors in young Danish patients with Type 1 diabetes mellitus. In 1987-89, a pediatric cohort involving approximately 75...... % of all children with Type 1 diabetes in Denmark retinopathy graded and all relevant diabetic parameters assessed. Of those, 185 (54.6 %) were evaluated again in 2011 for the same clinical parameters. All retinal images...... were graded using modified early treatment of DR study for 1995 and 2011. In 1995, mean age was 21.0 years and mean diabetes duration 13.5 years. The 16-year incidence of proliferative retinopathy, 2-step progression and 2-step regression of DR was 31.0, 64.4 and 0.0 %, respectively, while...
REGRESSIVE ANALYSIS OF BRAKING EFFICIENCY OF M1 CATEGORY VEHICLES WITH ANTI-BLOCKING BRAKE SYSTEM
О. Sarayev
2015-07-01
Full Text Available The problematics of assessing the effectiveness of vehicle braking after road accidentoccurrence is considered. For the first time in relation to the modern models of vehicles equipped with anti-lock brakes there were obtained regression models describing the relationship between the coefficient of traction and a random variable of steady deceleration. This does not contradict the essence of the stochastic physical object, which is the process of vehicle braking, unlike the previously adopted method of formalizing this process, using a deterministic function.
Vesnin, V. L.; Muradov, V. G.
2012-09-01
Absorption spectra of multicomponent hydrocarbon mixtures based on n-heptane and isooctane with addition of benzene (up to 1%) and toluene and o-xylene (up to 20%) were investigated experimentally in the region of the first overtones of the hydrocarbon groups (λ = 1620-1780 nm). It was shown that their concentrations could be determined separately by using a multiple linear regression method. The optimum result was obtained by including four wavelengths at 1671, 1680, 1685, and 1695 nm, which took into account absorption of CH groups in benzene, toluene, and o-xylene and CH3 groups, respectively.
Psoriasis regression analysis of MHC loci identifies shared genetic variants with vitiligo.
Kun-Ju Zhu
Full Text Available Psoriasis is a common inflammatory skin disease with genetic components of both immune system and the epidermis. PSOR1 locus (6q21 has been strongly associated with psoriasis; however, it is difficult to identify additional independent association due to strong linkage disequilibrium in the MHC region. We performed stepwise regression analyses of more than 3,000 SNPs in the MHC region genotyped using Human 610-Quad (Illumina in 1,139 cases with psoriasis and 1,132 controls of Han Chinese population to search for additional independent association. With four regression models obtained, two SNPs rs9468925 in HLA-C/HLA-B and rs2858881 in HLA-DQA2 were repeatedly selected in all models, suggesting that multiple loci outside PSOR1 locus were associated with psoriasis. More importantly we find that rs9468925 in HLA-C/HLA-B is associated with both psoriasis and vitiligo, providing first important evidence that two major skin diseases share a common genetic locus in the MHC, and a basis for elucidating the molecular mechanism of skin disorders.
Walker, Mary Ellen; Anonson, June; Szafron, Michael
2015-01-01
The relationship between political environment and health services accessibility (HSA) has not been the focus of any specific studies. The purpose of this study was to address this gap in the literature by examining the relationship between political environment and HSA. This relationship that HSA indicators (physicians, nurses and hospital beds per 10 000 people) has with political environment was analyzed with multiple least-squares regression using the components of democracy (electoral processes and pluralism, functioning of government, political participation, political culture, and civil liberties). The components of democracy were represented by the 2011 Economist Intelligence Unit Democracy Index (EIUDI) sub-scores. The EIUDI sub-scores and the HSA indicators were evaluated for significant relationships with multiple least-squares regression. While controlling for a country's geographic location and level of democracy, we found that two components of a nation's political environment: functioning of government and political participation, and their interaction had significant relationships with the three HSA indicators. These study findings are of significance to health professionals because they examine the political contexts in which citizens access health services, they come from research that is the first of its kind, and they help explain the effect political environment has on health. © The Author 2014. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Das, Sumonkanti; Rahman, Rajwanur M
2011-11-14
The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption. All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models.
Excel Implementation of Principal Component Regression Analysis%主成分回归分析的EXCEL实现
林建华
2015-01-01
EXCEL是一款功能非常强大的办公软件，文章利用其内置的公式和函数给出了主成分回归分析的完整算法和详细过程，得到的计算结果与专业统计软件给出的相同。因此，应用EXCEL实现主成分回归分析是可行的。%Excel is a very powerful software. This paper presents the algorithm and process of the principal component regression analysis using Excel’s built-in formulas and functions. Moreover, the results of the calculation are the same with the professional statistics softwares. Therefore, the application of EXCEL to realize the principal component regression analysis is feasible.
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Nick, Todd G; Campbell, Kathleen M
2007-01-01
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
Ghosh, Debarchana; Manson, Steven M.
2008-01-01
In this paper, we present a hybrid approach, robust principal component geographically weighted regression (RPCGWR), in examining urbanization as a function of both extant urban land use and the effect of social and environmental factors in the Twin Cities Metropolitan Area (TCMA) of Minnesota. We used remotely sensed data to treat urbanization via the proxy of impervious surface. We then integrated two different methods, robust principal component analysis (RPCA) and geographically weighted ...
Brahma, K.C.; Pal, B.K.; Das, C. [CMPDI, Bhubaneswar (India)
2005-07-01
Different models of vibration studies are examined. A case analysis to determine the parameters governing the prediction of blast vibration in an opencast coal mine is described. A regression model was developed to evaluate peak particle velocity (PPV) of the blast. The results are applicable to forecasting ground vibration before blasting and to the design of various parameters in controlled blasting. 16 refs., 1 fig., 1 tab.
Stauffer ME; Weisenfluh L; Morrison A
2013-01-01
Melissa E Stauffer, Lauren Weisenfluh, Alan MorrisonSCRIBCO, Effort, PA, USABackground: Triglyceride levels were found to be independently predictive of the development of primary coronary heart disease in epidemiologic studies. The objective of this study was to determine whether triglyceride levels were predictive of cardiovascular events in randomized controlled trials (RCTs) of lipid-modifying drugs.Methods: We performed a systematic review and meta-regression analysis of 40 RCTs of lipid...
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Linear regression analysis of oxygen ionic conductivity in co-doped electrolyte
XIE Guang-yuan; LI Jian; PU Jian; GUO Mi
2006-01-01
A mathematical model for the estimation of oxygen-ion conductivity of doped ZrO2 and CeO2 electrolytes was established based on the assumptions that the electronic conduction and defect association can be neglected. A linear regression method was employed to determine the parameters in the model. This model was confirmed by the published conductivity data of the doped ZrO2 and CeO2 electrolytes. In addition,a series of compositions in Ce0.8Gd0.2-xMxO1.9-δ system (M is the co-dopant) was prepared,their high temperature conductivity were measured. The model was further validated by the measured conductivity data.
Sørensen, Jens Benn; Badsberg, Jens Henrik; Olsen, Jens
1989-01-01
The prognostic factors for survival in advanced adenocarcinoma of the lung were investigated in a consecutive series of 259 patients treated with chemotherapy. Twenty-eight pretreatment variables were investigated by use of Cox's multivariate regression model, including histological subtypes...... and degree of differentiation, the new international staging system for lung cancer, and seven laboratory parameters. Staging of the patients included bone marrow examination but were otherwise nonextensive without routine bone, liver, and brain scans. Factors predicting poor survival were low performance...... status, stage IV disease, no prior nonradical resection, liver metastases, high values of white blood cell count, and lactate dehydrogenase, and low values of aspartate aminotransaminase. The nonradical resection may not be a prognostic factor because of the resection itself but may rather serve...
The Long-Term Impact of Human Capital Investment on GDP: A Panel Cointegrated Regression Analysis
Ahmet Gökçe Akpolat
2014-01-01
Full Text Available This study aims to determine the long-run impact of physical and human capital on GDP by using the panel data set of 13 developed and 11 developing countries over the period 1970–2010. Gross fixed capital formation is used as physical capital indicator while education expenditures and life expectancy at birth are used as human capital indicators. Panel DOLS and FMOLS panel cointegrated regression models are exploited to detect the magnitude and sign of the cointegration relationship and compare the effect of these physical and human capital variables according to these two different country groups. As a consequence of panels DOLS and FMOLS models, the impact of physical capital and education expenditures on GDP in the developed countries is determined as higher than the impact in the developing countries. On the other hand, the impact of life expectancy at birth on GDP is determined as higher in the developing countries.
Słania J.
2014-10-01
Full Text Available The article presents the process of production of coated electrodes and their welding properties. The factors concerning the welding properties and the currently applied method of assessing are given. The methodology of the testing based on the measuring and recording of instantaneous values of welding current and welding arc voltage is discussed. Algorithm for creation of reference data base of the expert system is shown, aiding the assessment of covered electrodes welding properties. The stability of voltage–current characteristics was discussed. Statistical factors of instantaneous values of welding current and welding arc voltage waveforms used for determining of welding process stability are presented. The results of coated electrodes welding properties are compared. The article presents the results of linear regression as well as the impact of the independent variables on the welding process performance. Finally the conclusions drawn from the research are given.
Jonathan E. Leightner
2012-01-01
Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Fengyun Liu
2016-10-01
Full Text Available This study analyzes the economic system dynamics of investment in real estate from mainly four participants in China. Local governments limit the supply of commercial and residential land to raise fiscal revenue, and expand debts by land mortgage to develop industrial zones and parks. Led by local government, banks and real estate development enterprises forge a coalition on real estate investment and facilitate real estate price appreciation. The above theoretical model is empirically evidenced with VAR (Vector Auto Regression methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue. Additional VAR models find that bank credit in addition to private and foreign funds respectively have strong positive dynamic effects on housing prices. Housing prices also have a strong positive impact on speculation from private funds and hot money.
Kang, Seung-Wan; Byun, Gukdo; Park, Hun-Joon
2014-12-01
This paper presents empirical research into the relationship between leader-follower value congruence in social responsibility and the level of ethical satisfaction for employees in the workplace. 163 dyads were analyzed, each consisting of a team leader and an employee working at a large manufacturing company in South Korea. Following current methodological recommendations for congruence research, polynomial regression and response surface modeling methodologies were used to determine the effects of value congruence. Results indicate that leader-follower value congruence in social responsibility was positively related to the ethical satisfaction of employees. Furthermore, employees' ethical satisfaction was stronger when aligned with a leader with high social responsibility. The theoretical and practical implications are discussed.
Spatial Growth Regressions for the convergence analysis of renewable energy consumption in Europe
Lara Fontanella
2013-10-01
Full Text Available In recent years there has been an increasing awareness on problems related to the economic growth and on the conditions under which some socio-economic variables measured on European countries tend to converge over time towards a common level. This paper is concerned with the use of energy from renewable sources and considers the extent to which EU countries meet the binding commitment to reach a fifth of energy consumption from renewable sources by 2020. By discussing empirical results on the economic growth pattern of 28 countries in the period 1995-2010, we make use of several spatial growth regression models. We show that the proposed models are able to capture the complexity of the phenomenon including the possibility of estimating sitespecific convergence parameters and the identification of convergence clubs.
Hall, Rob; Fienberg, Stephen
2011-01-01
Preserving the privacy of individual databases when carrying out statistical calculations has a long history in statistics and had been the focus of much recent attention in machine learning In this paper, we present a protocol for computing logistic regression when the data are held by separate parties without actually combining information sources by exploiting results from the literature on multi-party secure computation. We provide only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the combined database. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) and amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties.
Assessment of participation bias in cohort studies: systematic review and meta-regression analysis.
Silva Junior, Sérgio Henrique Almeida da; Santos, Simone M; Coeli, Cláudia Medina; Carvalho, Marilia Sá
2015-11-01
The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest. The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies. A systematic review (MEDLINE, Scopus and Web of Science) for articles describing the proportion of participation in the baseline of cohort studies was performed. Among the 2,964 initially identified, 50 were selected. The average proportion of participation was 64.7%. Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline) were associated with participation. Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources. Finally, journals should require the presentation of this information in the papers.
Assessment of participation bias in cohort studies: systematic review and meta-regression analysis
Sérgio Henrique Almeida da Silva Junior
2015-11-01
Full Text Available Abstract The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest. The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies. A systematic review (MEDLINE, Scopus and Web of Science for articles describing the proportion of participation in the baseline of cohort studies was performed. Among the 2,964 initially identified, 50 were selected. The average proportion of participation was 64.7%. Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline were associated with participation. Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources. Finally, journals should require the presentation of this information in the papers.
A transcriptome analysis by lasso penalized Cox regression for pancreatic cancer survival.
Wu, Tong Tong; Gong, Haijun; Clarke, Edmund M
2011-12-01
Pancreatic cancer is the fourth leading cause of cancer deaths in the United States with five-year survival rates less than 5% due to rare detection in early stages. Identification of genes that are directly correlated to pancreatic cancer survival is crucial for pancreatic cancer diagnostics and treatment. However, no existing GWAS or transcriptome studies are available for addressing this problem. We apply lasso penalized Cox regression to a transcriptome study to identify genes that are directly related to pancreatic cancer survival. This method is capable of handling the right censoring effect of survival times and the ultrahigh dimensionality of genetic data. A cyclic coordinate descent algorithm is employed to rapidly select the most relevant genes and eliminate the irrelevant ones. Twelve genes have been identified and verified to be directly correlated to pancreatic cancer survival time and can be used for the prediction of future patient's survival.
Noor Zaitun Yahaya
2017-01-01
Full Text Available This paper investigated the use of boosted regression trees (BRTs to draw an inference about daytime and nighttime ozone formation in a coastal environment. Hourly ground-level ozone data for a full calendar year in 2010 were obtained from the Kemaman (CA 002 air quality monitoring station. A BRT model was developed using hourly ozone data as a response variable and nitric oxide (NO, Nitrogen Dioxide (NO2 and Nitrogen Dioxide (NOx and meteorological parameters as explanatory variables. The ozone BRT algorithm model was constructed from multiple regression models, and the 'best iteration' of BRT model was performed by optimizing prediction performance. Sensitivity testing of the BRT model was conducted to determine the best parameters and good explanatory variables. Using the number of trees between 2,500-3,500, learning rate of 0.01, and interaction depth of 5 were found to be the best setting for developing the ozone boosting model. The performance of the O3 boosting models were assessed, and the fraction of predictions within two factor (FAC2, coefficient of determination (R2 and the index of agreement (IOA of the model developed for day and nighttime are 0.93, 0.69 and 0.73 for daytime and 0.79, 0.55 and 0.69 for nighttime respectively. Results showed that the model developed was within the acceptable range and could be used to understand ozone formation and identify potential sources of ozone for estimating O3 concentrations during daytime and nighttime. Results indicated that the wind speed, wind direction, relative humidity, and temperature were the most dominant variables in terms of influencing ozone formation. Finally, empirical evidence of the production of a high ozone level by wind blowing from coastal areas towards the interior region, especially from industrial areas, was obtained.
Gender roles and binge drinking among Latino emerging adults: a latent class regression analysis.
Vaughan, Ellen L; Wong, Y Joel; Middendorf, Katharine G
2014-09-01
Gender roles are often cited as a culturally specific predictor of drinking among Latino populations. This study used latent class regression to test the relationships between gender roles and binge drinking in a sample of Latino emerging adults. Participants were Latino emerging adults who participated in Wave III of the National Longitudinal Study of Adolescent Health (N = 2,442). A subsample of these participants (n = 660) completed the Bem Sex Role Inventory--Short. We conducted latent class regression using 3 dimensions of gender roles (femininity, social masculinity, and personal masculinity) to predict binge drinking. Results indicated a 3-class solution. In Class 1, the protective personal masculinity class, personal masculinity (e.g., being a leader, defending one's own beliefs) was associated with a reduction in the odds of binge drinking. In Class 2, the nonsignificant class, gender roles were not related to binge drinking. In Class 3, the mixed masculinity class, personal masculinity was associated with a reduction in the odds of binge drinking, whereas social masculinity (e.g., forceful, dominant) was associated with an increase in the odds of binge drinking. Post hoc analyses found that females, those born outside the United States, and those with greater English language usage were at greater odds of being in Class 1 (vs. Class 2). Males, those born outside the United States, and those with greater Spanish language usage were at greater odds of being in Class 3 (vs. Class 2). Directions for future research and implications for practice with Latino emerging adults are discussed.
Arch Index: An Easier Approach for Arch Height (A Regression Analysis
Hironmoy Roy
2012-04-01
Full Text Available Background: Arch-height estimation though practiced usually in supine posture; is neither correct nor scientific as referred in literature, which favour for standing x-rays or arch-index as yardstick. In fact the standing x-rays can be excused for being troublesome in busy OPD, but an ink-footprint on simple graph-sheet can be documented, as it is easier, cheaper and requires almost no machineries and expertisation. Objective: So this study aimed to redefine the inter-relationship of the radiological standing arch-heights with the arch-index for correlation and regression so that from the later we can derive the radiographical standing arch-height values indirectly, avoiding the actual maneuver. Methods: The study involved 103 adult subjects attending at a tertiary care hospital of North Bengal. From the standing x-rays of foot, the standing navicular, talar heights were measured, and ‘normalised’ with the foot length. In parallel foot-prints also been obtained for arch-index. Finally variables analysed by SPSS software. Result: The arch-index showed significant negative correlations and simple linear regressions with standing navicular height, standing talar height as well as standing normalised navicular and talar heights analysed in both sexes separately with supporting mathematical equations. Conclusion: To measure the standing arch-height in a busy OPD, it is wise to have the foot-print first. Arch-index once get known, can be put in the equations as derived here, to predict the preferred standing arch-heights in either sex.
Zardo, Pauline; Collie, Alex
2014-10-09
Use of research evidence in public health policy decision-making is affected by a range of contextual factors operating at the individual, organisational and external levels. Context-specific research is needed to target and tailor research translation intervention design and implementation to ensure that factors affecting research in a specific context are addressed. Whilst such research is increasing, there remain relatively few studies that have quantitatively assessed the factors that predict research use in specific public health policy environments. A quantitative survey was designed and implemented within two public health policy agencies in the Australian state of Victoria. Binary logistic regression analyses were conducted on survey data provided by 372 participants. Univariate logistic regression analyses of 49 factors revealed 26 factors that significantly predicted research use independently. The 26 factors were then tested in a single model and five factors emerged as significant predictors of research over and above all other factors. The five key factors that significantly predicted research use were the following: relevance of research to day-to-day decision-making, skills for research use, internal prompts for use of research, intention to use research within the next 12 months and the agency for which the individual worked. These findings suggest that individual- and organisational-level factors are the critical factors to target in the design of interventions aiming to increase research use in this context. In particular, relevance of research and skills for research use would be necessary to target. The likelihood for research use increased 11- and 4-fold for those who rated highly on these factors. This study builds on previous research and contributes to the currently limited number of quantitative studies that examine use of research evidence in a large sample of public health policy and program decision-makers within a specific context. The
An analysis of first-time blood donors return behaviour using regression models.
Kheiri, S; Alibeigi, Z
2015-08-01
Blood products have a vital role in saving many patients' lives. The aim of this study was to analyse blood donor return behaviour. Using a cross-sectional follow-up design of 5-year duration, 864 first-time donors who had donated blood were selected using a systematic sampling. The behaviours of donors via three response variables, return to donation, frequency of return to donation and the time interval between donations, were analysed based on logistic regression, negative binomial regression and Cox's shared frailty model for recurrent events respectively. Successful return to donation rated at 49·1% and the deferral rate was 13·3%. There was a significant reverse relationship between the frequency of return to donation and the time interval between donations. Sex, body weight and job had an effect on return to donation; weight and frequency of donation during the first year had a direct effect on the total frequency of donations. Age, weight and job had a significant effect on the time intervals between donations. Aging decreases the chances of return to donation and increases the time interval between donations. Body weight affects the three response variables, i.e. the higher the weight, the more the chances of return to donation and the shorter the time interval between donations. There is a positive correlation between the frequency of donations in the first year and the total number of return to donations. Also, the shorter the time interval between donations is, the higher the frequency of donations. © 2015 British Blood Transfusion Society.
Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis
Amatria Mario
2016-12-01
Full Text Available Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7 with the 8-a-side format (F-8 in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07 and the Creation Sector-Opponent's Half (0.18 vs 0.16. The probability was the same (0.04 in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20, and these were also more likely to be successful in this format (0.28 vs 0.19.
Hivert, Marie-France; Jablonski, Kathleen A.; Perreault, Leigh; Saxena, Richa; McAteer, Jarred B.; Franks, Paul W.; Hamman, Richard F.; Kahn, Steven E.; Haffner, Steven; Meigs, James B.; Altshuler, David; Knowler, William C.; Florez, Jose C.
2011-01-01
OBJECTIVE Over 30 loci have been associated with risk of type 2 diabetes at genome-wide statistical significance. Genetic risk scores (GRSs) developed from these loci predict diabetes in the general population. We tested if a GRS based on an updated list of 34 type 2 diabetes–associated loci predicted progression to diabetes or regression toward normal glucose regulation (NGR) in the Diabetes Prevention Program (DPP). RESEARCH DESIGN AND METHODS We genotyped 34 type 2 diabetes–associated variants in 2,843 DPP participants at high risk of type 2 diabetes from five ethnic groups representative of the U.S. population, who had been randomized to placebo, metformin, or lifestyle intervention. We built a GRS by weighting each risk allele by its reported effect size on type 2 diabetes risk and summing these values. We tested its ability to predict diabetes incidence or regression to NGR in models adjusted for age, sex, ethnicity, waist circumference, and treatment assignment. RESULTS In multivariate-adjusted models, the GRS was significantly associated with increased risk of progression to diabetes (hazard ratio [HR] = 1.02 per risk allele [95% CI 1.00–1.05]; P = 0.03) and a lower probability of regression to NGR (HR = 0.95 per risk allele [95% CI 0.93–0.98]; P < 0.0001). At baseline, a higher GRS was associated with a lower insulinogenic index (P < 0.001), confirming an impairment in β-cell function. We detected no significant interaction between GRS and treatment, but the lifestyle intervention was effective in the highest quartile of GRS (P < 0.0001). CONCLUSIONS A high GRS is associated with increased risk of developing diabetes and lower probability of returning to NGR in high-risk individuals, but a lifestyle intervention attenuates this risk. PMID:21378175
Zhang, Xiaona; Sun, Xiaoxuan; Wang, Junhong; Tang, Liou; Xie, Anmu
2017-01-01
Rapid eye movement sleep behavior disorder (RBD) is thought to be one of the most frequent preceding symptoms of Parkinson's disease (PD). However, the prevalence of RBD in PD stated in the published studies is still inconsistent. We conducted a meta and meta-regression analysis in this paper to estimate the pooled prevalence. We searched the electronic databases of PubMed, ScienceDirect, EMBASE and EBSCO up to June 2016 for related articles. STATA 12.0 statistics software was used to calculate the available data from each research. The prevalence of RBD in PD patients in each study was combined to a pooled prevalence with a 95 % confidence interval (CI). Subgroup analysis and meta-regression analysis were performed to search for the causes of the heterogeneity. A total of 28 studies with 6869 PD cases were deemed eligible and included in our meta-analysis based on the inclusion and exclusion criteria. The pooled prevalence of RBD in PD was 42.3 % (95 % CI 37.4-47.1 %). In subgroup analysis and meta-regression analysis, we found that the important causes of heterogeneity were the diagnosis criteria of RBD and age of PD patients (P = 0.016, P = 0.019, respectively). The results indicate that nearly half of the PD patients are suffering from RBD. Older age and longer duration are risk factors for RBD in PD. We can use the minimal diagnosis criteria for RBD according to the International Classification of Sleep Disorders to diagnose RBD patients in our daily work if polysomnography is not necessary.
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
T. Nataraja Moorthy
2015-05-01
Full Text Available The human foot has been studied for a variety of reasons, i.e., for forensic as well as non-forensic purposes by anatomists, forensic scientists, anthropologists, physicians, podiatrists, and numerous other groups. An aspect of human identification that has received scant attention from forensic anthropologists is the study of human feet and the footprints made by the feet. The present study, conducted during 2013-2014, aimed to derive population specific regression equations to estimate stature from the footprint anthropometry of indigenous adult Bidayuhs in the east of Malaysia. The study sample consisted of 480 bilateral footprints collected using a footprint kit from 240 Bidayuhs (120 males and 120 females, who consented to taking part in the study. Their ages ranged from 18 to 70 years. Stature was measured using a portable body meter device (SECA model 206. The data were analyzed using PASW Statistics version 20. In this investigation, better results were obtained in terms of correlation coefficient (R between stature and various footprint measurements and regression analysis in estimating the stature. The (R values showed a positive and statistically significant (p < 0.001 relationship between the two parameters. The correlation coefficients in the pooled sample (0.861–0.882 were comparatively higher than those of an individual male (0.762-0.795 and female (0.722-0.765. This study provided regression equations to estimate stature from footprints in the Bidayuh population. The result showed that the regression equations without sex indicators performed significantly better than models with gender indications. The regression equations derived for a pooled sample can be used to estimate stature, even when the sex of the footprint is unknown, as in real crime scenes.
Weisberg, Sanford
2005-01-01
Master linear regression techniques with a new edition of a classic text Reviews of the Second Edition: ""I found it enjoyable reading and so full of interesting material that even the well-informed reader will probably find something new . . . a necessity for all of those who do linear regression."" -Technometrics, February 1987 ""Overall, I feel that the book is a valuable addition to the now considerable list of texts on applied linear regression. It should be a strong contender as the leading text for a first serious course in regression analysis."" -American Scientist, May-June 1987
Yun Joo eYoo
2013-11-01
Full Text Available Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC test that is a compromise between a 1df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using 1 set of all SNPs in a gene, 2 set of common SNPs in a gene (MAF≥5%, 3 set of low frequency SNPs (1%≤MAF
Nishida,Keiichiro
2013-02-01
Full Text Available The purpose of this study was to quantitatively evaluate Akahori's preoperative classification of cubital tunnel syndrome. We analyzed the results for 57 elbows that were treated by a simple decompression procedure from 1997 to 2004. The relationship between each item of Akahori's preoperative classification and clinical stage was investigated based on the parameter distribution. We evaluated Akahori's classification system using multiple regression analysis, and investigated the association between the stage and treatment results. The usefulness of the regression equation was evaluated by analysis of variance of the expected and observed scores. In the parameter distribution, each item of Akahori's classification was mostly associated with the stage, but it was difficult to judge the severity of palsy. In the mathematical evaluation, the most effective item in determining the stage was sensory conduction velocity. It was demonstrated that the established regression equation was highly reliable (R＝0.922. Akahori's preoperative classification can also be used in postoperative classification, and this classification was correlated with postoperative prognosis. Our results indicate that Akahori's preoperative classification is a suitable system. It is reliable, reproducible and well-correlated with the postoperative prognosis. In addition, the established prediction formula is useful to reduce the diagnostic complexity of Akahori's classification.
Mokhtari, Mehdi; Miri, Mohammad; Nikoonahad, Ali; Jalilian, Ali; Naserifar, Razi; Ghaffari, Hamid Reza; Kazembeigi, Farogh
2016-11-01
The aim of this study was to investigate the impact of the environmental factors on cutaneous leishmaniasis (CL) prevalence and morbidity in Ilam province, western Iran, as a known endemic area for this disease. Accurate locations of 3237 CL patients diagnosed from 2013 to 2015, their demographic information, and data of 17 potentially predictive environmental variables (PPEVs) were prepared to be used in Geographic Information System (GIS) and Land-Use Regression (LUR) analysis. The prevalence, risk, and predictive risk maps were provided using Inverse Distance Weighting (IDW) model in GIS software. Regression analysis was used to determine how environmental variables affect on CL prevalence. All maps and regression models were developed based on the annual and three-year average of the CL prevalence. The results showed that there was statistically significant relationship (P value≤0.05) between CL prevalence and 11 (64%) PPEVs which were elevation, population, rainfall, temperature, urban land use, poorland, dry farming, inceptisol and aridisol soils, and forest and irrigated lands. The highest probability of the CL prevalence was predicted in the west of the study area and frontier with Iraq. An inverse relationship was found between CL prevalence and environmental factors, including elevation, covering soil, rainfall, agricultural irrigation, and elevation while this relation was positive for temperature, urban land use, and population density. Environmental factors were found to be an important predictive variables for CL prevalence and should be considered in management strategies for CL control.
Schümberg, Katharina; Polyakova, Maryna; Steiner, Johann; Schroeter, Matthias L
2016-01-01
S100B has been linked to glial pathology in several psychiatric disorders. Previous studies found higher S100B serum levels in patients with schizophrenia compared to healthy controls, and a number of covariates influencing the size of this effect have been proposed in the literature. Here, we conducted a meta-analysis and meta-regression analysis on alterations of serum S100B in schizophrenia in comparison with healthy control subjects. The meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement to guarantee a high quality and reproducibility. With strict inclusion criteria 19 original studies could be included in the quantitative meta-analysis, comprising a total of 766 patients and 607 healthy control subjects. The meta-analysis confirmed higher values of the glial serum marker S100B in schizophrenia if compared with control subjects. Meta-regression analyses revealed significant effects of illness duration and clinical symptomatology, in particular the total score of the Positive and Negative Syndrome Scale (PANSS), on serum S100B levels in schizophrenia. In sum, results confirm glial pathology in schizophrenia that is modulated by illness duration and related to clinical symptomatology. Further studies are needed to investigate mechanisms and mediating factors related to these findings.
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Chang, Yu-Hung; Lei, Chen-Chou; Lin, Kun-Chen; Chang, Dao-Ming; Hsieh, Chang-Hsun; Lee, Yau-Jiunn
2016-09-01
To investigate the association of serum uric acid level with renal function change in patients with type 2 diabetes mellitus (T2DM). T2DM patients who had been followed-up for at least 3 years were included. Participants were categorized into stable, progression, or regression groups according to their change in chronic kidney disease (CKD) stage. During the follow-up period, all numeric values of metabolic factors, including the uric acid level and the medication possession rate, were calculated in order to investigate their associations with CKD development. Multivariate Cox regression analyses were used to identify independent factors associated with change in CKD. A total of 2367 T2DM patients were enrolled in this study and followed-up for a mean of 4.6 years. The numbers of patients in the stable, progression and regression groups were 1133 (47.9%), 487 (20.6%), and 747 (31.5%), respectively. The progression group had the highest serum uric acid level (6.9 ± 1.8 mg/dL), and the regression group had the lowest uric acid level (5.4 ± 1.5 mg/dL). In addition, we found that the serum uric acid level was an independent factor associated with CKD progression when the value exceeded 6.3 mg/dL. A lower uric acid level could be beneficial for CKD improvement in T2DM patients with stage 3-5 CKD. Our data indicated that the serum uric acid level is associated with CKD regression and progression and suggested that a high normal serum uric acid level should be closely monitored in patients with T2DM. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Tosun Erdi
2017-01-01
Full Text Available This study was aimed at estimating the variation of several engine control parameters within the rotational speed-load map, using regression analysis and artificial neural network techniques. Duration of injection, specific fuel consumption, exhaust gas at turbine inlet, and within the catalytic converter brick were chosen as the output parameters for the models, while engine speed and brake mean effective pressure were selected as independent variables for prediction. Measurements were performed on a turbocharged direct injection spark ignition engine fueled with gasoline. A three-layer feed-forward structure and back-propagation algorithm was used for training the artificial neural network. It was concluded that this technique is capable of predicting engine parameters with better accuracy than linear and non-linear regression techniques.
Ghosh, Debarchana; Manson, Steven M
2008-01-01
In this paper, we present a hybrid approach, robust principal component geographically weighted regression (RPCGWR), in examining urbanization as a function of both extant urban land use and the effect of social and environmental factors in the Twin Cities Metropolitan Area (TCMA) of Minnesota. We used remotely sensed data to treat urbanization via the proxy of impervious surface. We then integrated two different methods, robust principal component analysis (RPCA) and geographically weighted regression (GWR) to create an innovative approach to model urbanization. The RPCGWR results show significant spatial heterogeneity in the relationships between proportion of impervious surface and the explanatory factors in the TCMA. We link this heterogeneity to the "sprawling" nature of urban land use that has moved outward from the core Twin Cities through to their suburbs and exurbs.
A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions
McAdams, H.T.
2001-02-14
An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Snyder, Carolyn W.
2016-09-01
Statistical challenges often preclude comparisons among different sea surface temperature (SST) reconstructions over the past million years. Inadequate consideration of uncertainty can result in misinterpretation, overconfidence, and biased conclusions. Here I apply Bayesian hierarchical regressions to analyze local SST responsiveness to climate changes for 54 SST reconstructions from across the globe over the past million years. I develop methods to account for multiple sources of uncertainty, including the quantification of uncertainty introduced from absolute dating into interrecord comparisons. The estimates of local SST responsiveness explain 64% (62% to 77%, 95% interval) of the total variation within each SST reconstruction with a single number. There is remarkable agreement between SST proxy methods, with the exception of Mg/Ca proxy methods estimating muted responses at high latitudes. The Indian Ocean exhibits a muted response in comparison to other oceans. I find a stable estimate of the proposed "universal curve" of change in local SST responsiveness to climate changes as a function of sin2(latitude) over the past 400,000 years: SST change at 45°N/S is larger than the average tropical response by a factor of 1.9 (1.5 to 2.6, 95% interval) and explains 50% (35% to 58%, 95% interval) of the total variation between each SST reconstruction. These uncertainty and statistical methods are well suited for application across paleoclimate and environmental data series intercomparisons.
Regression analysis of the structure function for reliability evaluation of continuous-state system
Gamiz, M.L., E-mail: mgamiz@ugr.e [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain); Martinez Miranda, M.D. [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain)
2010-02-15
Technical systems are designed to perform an intended task with an admissible range of efficiency. According to this idea, it is permissible that the system runs among different levels of performance, in addition to complete failure and the perfect functioning one. As a consequence, reliability theory has evolved from binary-state systems to the most general case of continuous-state system, in which the state of the system changes over time through some interval on the real number line. In this context, obtaining an expression for the structure function becomes difficult, compared to the discrete case, with difficulty increasing as the number of components of the system increases. In this work, we propose a method to build a structure function for a continuum system by using multivariate nonparametric regression techniques, in which certain analytical restrictions on the variable of interest must be taken into account. Once the structure function is obtained, some reliability indices of the system are estimated. We illustrate our method via several numerical examples.
Kang, Gumin; Lee, Kwangchil; Park, Haesung; Lee, Jinho; Jung, Youngjean; Kim, Kyoungsik; Son, Boongho; Park, Hyoungkuk
2010-06-15
Mixed hydrofluoric and nitric acids are widely used as a good etchant for the pickling process of stainless steels. The cost reduction and the procedure optimization in the manufacturing process can be facilitated by optically detecting the concentration of the mixed acids. In this work, we developed a novel method which allows us to obtain the concentrations of hydrofluoric acid (HF) and nitric acid (HNO(3)) mixture samples with high accuracy. The experiments were carried out for the mixed acids which consist of the HF (0.5-3wt%) and the HNO(3) (2-12wt%) at room temperature. Fourier Transform Raman spectroscopy has been utilized to measure the concentration of the mixed acids HF and HNO(3), because the mixture sample has several strong Raman bands caused by the vibrational mode of each acid in this spectrum. The calibration of spectral data has been performed using the partial least squares regression method which is ideal for local range data treatment. Several figures of merit (FOM) were calculated using the concept of net analyte signal (NAS) to evaluate performance of our methodology.
Blood gas tensions in adult asthma: a systematic review and meta-regression analysis.
Johansen, Troels; Johansen, Peter; Dahl, Ronald
2014-11-01
The last half-century has seen substantial changes in asthma treatment and care. We investigated whether arterial blood gas parameters in acute and non-acute asthma have changed historically. We performed a systematic search of the literature for studies reporting P(aO2) , P(aCO2) and forced expiratory volume in 1 s, percentage of predicted (FEV1%). For each of the blood gas parameters, meta-regression analyses examined its association with four background variables: the publication year, mean FEV1%, mean age and female fraction in the respective studies. After screening, we included 43 articles comprising 61 datasets published between 1967 and 2013. In studies of habitual-state asthma, mean P(aO2) was positively associated with the publication year (p = 0.001) and negatively with mean age (p blood gas levels were unassociated with publication year and mean age, mean P(aO2) was positively associated with FEV1% (p arterial pH associated with any of the predictor variables. In studies of habitual-state asthma, mean reported P(aO2) and P(aCO2) levels were found to have increased since 1967. In acute asthma studies, mean P(aO2) and P(aCO2) were associated with mean FEV1% but not with either publication year or patient age.
Expert Involvement Predicts mHealth App Downloads: Multivariate Regression Analysis of Urology Apps.
Pereira-Azevedo, Nuno; Osório, Luís; Cavadas, Vitor; Fraga, Avelino; Carrasquinho, Eduardo; Cardoso de Oliveira, Eduardo; Castelo-Branco, Miguel; Roobol, Monique J
2016-07-15
Urological mobile medical (mHealth) apps are gaining popularity with both clinicians and patients. mHealth is a rapidly evolving and heterogeneous field, with some urology apps being downloaded over 10,000 times and others not at all. The factors that contribute to medical app downloads have yet to be identified, including the hypothetical influence of expert involvement in app development. The objective of our study was to identify predictors of the number of urology app downloads. We reviewed urology apps available in the Google Play Store and collected publicly available data. Multivariate ordinal logistic regression evaluated the effect of publicly available app variables on the number of apps being downloaded. Of 129 urology apps eligible for study, only 2 (1.6%) had >10,000 downloads, with half having ≤100 downloads and 4 (3.1%) having none at all. Apps developed with expert urologist involvement (P=.003), optional in-app purchases (P=.01), higher user rating (PApp cost was inversely related to the number of downloads (Papp development is likely to enhance its chances to have a higher number of downloads. This finding should help in the design of better apps and further promote urologist involvement in mHealth. Official certification processes are required to ensure app quality and user safety.
Cabras, Stefano; Castellanos, Maria Eugenia; Perra, Silvia
2014-11-20
This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model and observations are censored. Under a Bayesian perspective, the most widely used tools are Bayes factors (BFs), which are undefined when improper priors are used. In order to overcome this issue, fractional (FBF) and intrinsic (IBF) BFs have become common tools for model selection. Both depend on the size, Nt , of a minimal training sample (MTS), while the IBF also depends on the specific MTS used. In the case of regression with censored data, the definition of an MTS is problematic because only uncensored data allow to turn the improper prior into a proper posterior and also because full exploration of the space of the MTSs, which includes also censored observations, is needed to avoid bias in model selection. To address this concern, a sequential MTS was proposed, but it has the drawback of an increase of the number of possible MTSs as Nt becomes random. For this reason, we explore the behaviour of the FBF, contextualizing its definition to censored data. We show that these are consistent, providing also the corresponding fractional prior. Finally, a large simulation study and an application to real data are used to compare IBF, FBF and the well-known Bayesian information criterion.
A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions
McAdams, H.T.
2001-02-14
An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.
Mahmut Zortuk
2016-08-01
Full Text Available The Environmental Kuznets Curve (EKC introduces an inverted U-shaped relationship between environmental pollution and economic development. The inverted U-shaped curve is seen as complete pattern for developed economies. However, our study tests the EKC for developing transition economies of European Union, therefore, our results could make a significant contribution to the literature. In this paper, the relationship between carbon dioxide (CO2 emissions, gross domestic product (GDP, energy use and urban population is investigated in the Transition Economies (Bulgaria, Croatia, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia and Slovenia. Environmental Kuznets Curve is tested by panel smooth transition regression for these economies for 1993 – 2010 periods. As a result of study, the null hypothesis of linearity was rejected and no-remaining nonlinearity test showed that there is a smooth transition exists between two regimes (below $5176 GDP per capita is first one and above $5176 GDP per capita is second one in the related period for these economies.
Shirali, Mahmoud; Nielsen, Vivi Hunnicke; Møller, Steen Henrik
Heritability of residual feed intake (RFI) increased from low to high over the growing period in male and female mink. The lowest heritability for RFI (male: 0.04 ± 0.01 standard deviation (SD); female: 0.05 ± 0.01 SD) was in early and the highest heritability (male: 0.33 ± 0.02; female: 0.34 ± 0...... at the end compared to the early growing period suggesting that heterogeneous residual variance should be considered for analyzing feed efficiency data in mink. This study suggests random regression methods are suitable for analyzing feed efficiency and that genetic selection for RFI in mink is promising........02 SD) was achieved at the late growth stages. The genetic correlation between different growth stages for RFI showed a high association (0.91 to 0.98) between early and late growing periods. However, phenotypic correlations were lower from 0.29 to 0.50. The residual variances were substantially higher...
Nora Fenske
Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Parappagoudar, Mahesh B.; Pratihar, Dilip K.; Datta, Gouranga L.
2008-08-01
A cement-bonded moulding sand system takes a fairly long time to attain the required strength. Hence, the moulds prepared with cement as a bonding material will have to wait a long time for the metal to be poured. In this work, an accelerator was used to accelerate the process of developing the bonding strength. Regression analysis was carried out on the experimental data collected as per statistical design of experiments (DOE) to establish input-output relationships of the process. The experiments were conducted to measure compression strength and hardness (output parameters) by varying the input variables, namely amount of cement, amount of accelerator, water in the form of cement-to-water ratio, and testing time. A two-level full-factorial design was used for linear regression model, whereas a three-level central composite design (CCD) had been utilized to develop non-linear regression model. Surface plots and main effects plots were used to study the effects of amount of cement, amount of accelerator, water and testing time on compression strength, and mould hardness. It was observed from both the linear as well as non-linear models that amount of cement, accelerator, and testing time have some positive contributions, whereas cement-to-water ratio has negative contribution to both the above responses. Compression strength was found to have linear relationship with the amount of cement and accelerator, and non-linear relationship with the remaining process parameters. Mould hardness was seen to vary linearly with testing time and non-linearly with the other parameters. Analysis of variance (ANOVA) was performed to test statistical adequacy of the models. Twenty random test cases were considered to test and compare their performances. Non-linear regression models were found to perform better than the linear models for both the responses. An attempt was also made to express compression strength of the moulding sand system as a function of mould hardness.
Kiessling Arndt H
2011-10-01
Full Text Available Abstract Background We assessed the hemodynamic performance of various prostheses and the clinical outcomes after aortic valve replacement, in different age groups. Methods One-hundred-and-twenty patients with isolated aortic valve stenosis were included in this prospective randomized randomised trial and allocated in three age-groups to receive either pulmonary autograft (PA, n = 20 or mechanical prosthesis (MP, Edwards Mira n = 20 in group 1 (age 75. Clinical outcomes and hemodynamic performance were evaluated at discharge, six months and one year. Results In group 1, patients with PA had significantly lower mean gradients than the MP (2.6 vs. 10.9 mmHg, p = 0.0005 with comparable left ventricular mass regression (LVMR. Morbidity included 1 stroke in the PA population and 1 gastrointestinal bleeding in the MP subgroup. In group 2, mean gradients did not differ significantly between both populations (7.0 vs. 8.9 mmHg, p = 0.81. The rate of LVMR and EF were comparable at 12 months; each group with one mortality. Morbidity included 1 stroke and 1 gastrointestinal bleeding in the stentless and 3 bleeding complications in the MP group. In group 3, mean gradients did not differ significantly (7.8 vs 6.5 mmHg, p = 0.06. Postoperative EF and LVMR were comparable. There were 3 deaths in the stented group and no mortality in the stentless group. Morbidity included 1 endocarditis and 1 stroke in the stentless compared to 1 endocarditis, 1 stroke and one pulmonary embolism in the stented group. Conclusions Clinical outcomes justify valve replacement with either valve substitute in the respective age groups. The PA hemodynamically outperformed the MPs. Stentless valves however, did not demonstrate significantly superior hemodynamics or outcomes in comparison to stented bioprosthesis or MPs.
Risk assessment of dengue fever in Zhongshan, China: a time-series regression tree analysis.
Liu, K-K; Wang, T; Huang, X-D; Wang, G-L; Xia, Y; Zhang, Y-T; Jing, Q-L; Huang, J-W; Liu, X-X; Lu, J-H; Hu, W-B
2017-02-01
Dengue fever (DF) is the most prevalent and rapidly spreading mosquito-borne disease globally. Control of DF is limited by barriers to vector control and integrated management approaches. This study aimed to explore the potential risk factors for autochthonous DF transmission and to estimate the threshold effects of high-order interactions among risk factors. A time-series regression tree model was applied to estimate the hierarchical relationship between reported autochthonous DF cases and the potential risk factors including the timeliness of DF surveillance systems (median time interval between symptom onset date and diagnosis date, MTIOD), mosquito density, imported cases and meteorological factors in Zhongshan, China from 2001 to 2013. We found that MTIOD was the most influential factor in autochthonous DF transmission. Monthly autochthonous DF incidence rate increased by 36·02-fold [relative risk (RR) 36·02, 95% confidence interval (CI) 25·26-46·78, compared to the average DF incidence rate during the study period] when the 2-month lagged moving average of MTIOD was >4·15 days and the 3-month lagged moving average of the mean Breteau Index (BI) was ⩾16·57. If the 2-month lagged moving average MTIOD was between 1·11 and 4·15 days and the monthly maximum diurnal temperature range at a lag of 1 month was <9·6 °C, the monthly mean autochthonous DF incidence rate increased by 14·67-fold (RR 14·67, 95% CI 8·84-20·51, compared to the average DF incidence rate during the study period). This study demonstrates that the timeliness of DF surveillance systems, mosquito density and diurnal temperature range play critical roles in the autochthonous DF transmission in Zhongshan. Better assessment and prediction of the risk of DF transmission is beneficial for establishing scientific strategies for DF early warning surveillance and control.
Maternal heavy alcohol use and toddler behavior problems: a fixed effects regression analysis.
Knudsen, Ann Kristin; Ystrom, Eivind; Skogen, Jens Christoffer; Torgersen, Leila
2015-10-01
Using data from the longitudinal Norwegian Mother and Child Cohort Study, the aims of the current study were to examine associations between postnatal maternal heavy alcohol use and toddler behavior problems, taking both observed and unobserved confounding factors into account by employing fixed effects regression models. Postnatal maternal heavy alcohol use (defined as drinking alcohol 4 or more times a week, or drinking 7 units or more per alcohol use episode) and toddler internalizing and externalizing behavior problems were assessed when the toddlers were aged 18 and 36 months. Maternal psychopathology, civil status and negative life events last year were included as time-variant covariates. Maternal heavy alcohol use was associated with toddler internalizing and externalizing behavior problems (p < 0.001) in the population when examined with generalized estimating equation models. The associations disappeared when observed and unobserved sources of confounding were taken into account in the fixed effects models [(p = 0.909 for externalizing behaviors (b = 0.002, SE = 0.021), p = 0.928 for internalizing behaviors (b = 0.002, SE = 0.023)], with an even further reduction of the estimates with the inclusion of time-variant confounders. No causal effect was found between postnatal maternal heavy alcohol use and toddler behavior problems. Increased levels of behavior problems among toddlers of heavy drinking mothers should therefore be attributed to other adverse characteristics associated with these mothers, toddlers and families. This should be taken into account when interventions aimed at at-risk families identified by maternal heavy alcohol use are planned and conducted.
Unemployment and psychosocial outcomes to age 30: A fixed-effects regression analysis.
Fergusson, David M; McLeod, Geraldine F; Horwood, L John
2014-08-01
We aimed to examine the associations between exposure to unemployment and psychosocial outcomes over the period from 16 to 30 years, using data from a well-studied birth cohort. Data were collected over the course of the Christchurch Health and Development Study, a longitudinal study of a birth cohort of 1265 children, born in Christchurch in 1977, who have been studied to age 30. Assessments of unemployment and psychosocial outcomes (mental health, substance abuse/dependence, criminal offending, adverse life events and life satisfaction) were obtained at ages 18, 21, 25 and 30. Prior to adjustment, an increasing duration of unemployment was associated with significant increases in the risk of all psychosocial outcomes. These associations were adjusted for confounding using conditional, fixed-effects regression techniques. The analyses showed significant (p unemployment and major depression (p = 0.05), alcohol abuse/dependence (p = 0.043), illicit substance abuse/dependence (p = 0.017), property/violent offending (p unemployment. The findings suggested that the association between unemployment and psychosocial outcomes was likely to involve a causal process in which unemployment led to increased risks of adverse psychosocial outcomes. Effect sizes were estimated using attributable risk; exposure to unemployment accounted for between 4.2 and 14.0% (median 10.8%) of the risk of experiencing the significant psychosocial outcomes. The findings of this study suggest that exposure to unemployment had small but pervasive effects on psychosocial adjustment in adolescence and young adulthood. © The Royal Australian and New Zealand College of Psychiatrists 2014.
A non-linear regression method for CT brain perfusion analysis
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.