A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Interpreting Multiple Logistic Regression Coefficients in Prospective Observational Studies
1982-11-01
prompted close examination of the issue at a workshop on hypertriglyceridemia where some of the cautions and perspectives given in this paper were...characteristics. If this is not the interest, then to isolate and-understand the effect of a characteris- tic on CHD when it could be one of several interacting...also easily extended to the case when several independent variables are modeled in a multiple logistic equation. In this instance, if xlx 2,..., x are
Understanding logistic regression analysis
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
National Research Council Canada - National Science Library
Bielecki, John
2003-01-01
.... Previous research has demonstrated the use of a two-step logistic and multiple regression methodology to predicting cost growth produces desirable results versus traditional single-step regression...
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Directory of Open Access Journals (Sweden)
MILAD TAZIK
2017-11-01
Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.
Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki
2017-05-01
This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2016-01-01
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Hosmer, David W; Sturdivant, Rodney X
2013-01-01
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal
2005-09-01
To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Screening for ketosis using multiple logistic regression based on milk yield and composition.
Kayano, Mitsunori; Kataoka, Tomoko
2015-11-01
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (Pketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (Pketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Bonellie, Sandra R
2012-10-01
To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother. Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
and Multinomial Logistic Regression
African Journals Online (AJOL)
This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Steganalysis using logistic regression
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
SEPARATION PHENOMENA LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Ikaro Daniel de Carvalho Barreto
2014-03-01
Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
International Nuclear Information System (INIS)
Bhowmik, K.R.; Islam, S.
2016-01-01
Logistic regression (LR) analysis is the most common statistical methodology to find out the determinants of childhood mortality. However, the significant predictors cannot be ranked according to their influence on the response variable. Multiple classification (MC) analysis can be applied to identify the significant predictors with a priority index which helps to rank the predictors. The main objective of the study is to find the socio-demographic determinants of childhood mortality at neonatal, post-neonatal, and post-infant period by fitting LR model as well as to rank those through MC analysis. The study is conducted using the data of Bangladesh Demographic and Health Survey 2007 where birth and death information of children were collected from their mothers. Three dichotomous response variables are constructed from children age at death to fit the LR and MC models. Socio-economic and demographic variables significantly associated with the response variables separately are considered in LR and MC analyses. Both the LR and MC models identified the same significant predictors for specific childhood mortality. For both the neonatal and child mortality, biological factors of children, regional settings, and parents socio-economic status are found as 1st, 2nd, and 3rd significant groups of predictors respectively. Mother education and household environment are detected as major significant predictors of post-neonatal mortality. This study shows that MC analysis with or without LR analysis can be applied to detect determinants with rank which help the policy makers taking initiatives on a priority basis. (author)
Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.
2017-01-01
Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
Inverse estimation of multiple muscle activations based on linear logistic regression.
Sekiya, Masashi; Tsuji, Toshiaki
2017-07-01
This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.
Multiple logistic regression model of signalling practices of drivers on urban highways
Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana
2015-05-01
Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.
Logistic Regression: Concept and Application
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Standards for Standardized Logistic Regression Coefficients
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Shelley M. ALEXANDER
2009-02-01
Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].
2017-03-23
Logistic Regression to Estimate the Median Will-Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle, B.S...not the other. We are able to give logistic regression models to program managers that identify several program characteristics for either...considered acceptable. We recommend the use of our logistic models as a tool to manage a portfolio of programs in order to gain potential elusive
Satellite rainfall retrieval by logistic regression
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Targeting: Logistic Regression, Special Cases and Extensions
Directory of Open Access Journals (Sweden)
Helmut Schaeben
2014-12-01
Full Text Available Logistic regression is a classical linear model for logit-transformed conditional probabilities of a binary target variable. It recovers the true conditional probabilities if the joint distribution of predictors and the target is of log-linear form. Weights-of-evidence is an ordinary logistic regression with parameters equal to the differences of the weights of evidence if all predictor variables are discrete and conditionally independent given the target variable. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, the application of weights-of-evidence does not only corrupt the predicted conditional probabilities, but also their rank transform. Logistic regression models, including the interaction terms, can account for the lack of conditional independence, appropriate interaction terms compensate exactly for violations of conditional independence. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function is used as the activation function. If the net topology, i.e., its control, is sufficiently versatile to mimic interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Weights-of-evidence cannot reasonably include interaction terms; subsequent modifications of the weights, as often suggested, cannot emulate the effect of interaction terms.
Predicting Social Trust with Binary Logistic Regression
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
Logistic regression a self-learning text
Kleinbaum, David G
1994-01-01
This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.
On logistic regression analysis of dichotomized responses.
Lu, Kaifeng
2017-01-01
We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Logistic regression applied to natural hazards: rare event logistic regression with replications
Guns, M.; Vanacker, Veerle
2012-01-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logisti...
Multinomial logistic regression in workers' health
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
2017-11-01
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Estimating the exceedance probability of rain rate by logistic regression
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
International Nuclear Information System (INIS)
Hung, J.; Chaitman, B.R.; Lam, J.; Lesperance, J.; Dupras, G.; Fines, P.; Cherkaoui, O.; Robert, P.; Bourassa, M.G.
1985-01-01
The incremental diagnostic yield of clinical data, exercise ECG, stress thallium scintigraphy, and cardiac fluoroscopy to predict coronary and multivessel disease was assessed in 171 symptomatic men by means of multiple logistic regression analyses. When clinical variables alone were analyzed, chest pain type and age were predictive of coronary disease, whereas chest pain type, age, a family history of premature coronary disease before age 55 years, and abnormal ST-T wave changes on the rest ECG were predictive of multivessel disease. The percentage of patients correctly classified by cardiac fluoroscopy (presence or absence of coronary artery calcification), exercise ECG, and thallium scintigraphy was 9%, 25%, and 50%, respectively, greater than for clinical variables, when the presence or absence of coronary disease was the outcome, and 13%, 25%, and 29%, respectively, when multivessel disease was studied; 5% of patients were misclassified. When the 37 clinical and noninvasive test variables were analyzed jointly, the most significant variable predictive of coronary disease was an abnormal thallium scan and for multivessel disease, the amount of exercise performed. The data from this study provide a quantitative model and confirm previous reports that optimal diagnostic efficacy is obtained when noninvasive tests are ordered sequentially. In symptomatic men, cardiac fluoroscopy is a relatively ineffective test when compared to exercise ECG and thallium scintigraphy
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Supporting Regularized Logistic Regression Privately and Efficiently.
Directory of Open Access Journals (Sweden)
Wenfa Li
Full Text Available As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Directory of Open Access Journals (Sweden)
Samuel Ribeiro Figueiredo
2008-12-01
hydrographic variables (distance to rivers, flow length, topographical wetness index, and stream power index. Multiple logistic regressions were established between the soil classes mapped on the basis of a traditional survey at a scale of 1:80.000 and the land variables calculated using the DEM. The regressions were used to calculate the probability of occurrence of each soil class. The final estimated soil map was drawn by assigning the soil class with highest probability of occurrence to each cell. The general accuracy was evaluated at 58 % and the Kappa coefficient at 38 % in a comparison of the original soil map with the map estimated at the original scale. A legend simplification had little effect to increase the general accuracy of the map (general accuracy of 61 % and Kappa coefficient of 39 %. It was concluded that multiple logistic regressions have a predictive potential as tool of supervised soil mapping.
Logistic regression applied to natural hazards: rare event logistic regression with replications
Directory of Open Access Journals (Sweden)
M. Guns
2012-06-01
Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Logistic regression applied to natural hazards: rare event logistic regression with replications
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
BANK FAILURE PREDICTION WITH LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Taha Zaghdoudi
2013-04-01
Full Text Available In recent years the economic and financial world is shaken by a wave of financial crisis and resulted in violent bank fairly huge losses. Several authors have focused on the study of the crises in order to develop an early warning model. It is in the same path that our work takes its inspiration. Indeed, we have tried to develop a predictive model of Tunisian bank failures with the contribution of the binary logistic regression method. The specificity of our prediction model is that it takes into account microeconomic indicators of bank failures. The results obtained using our provisional model show that a bank's ability to repay its debt, the coefficient of banking operations, bank profitability per employee and leverage financial ratio has a negative impact on the probability of failure.
Logistic regression against a divergent Bayesian network
Directory of Open Access Journals (Sweden)
Noel Antonio Sánchez Trujillo
2015-01-01
Full Text Available This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered; we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.
Directory of Open Access Journals (Sweden)
Elvio Giasson
2006-06-01
Full Text Available Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust
Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits
National Research Council Canada - National Science Library
Gravier, Michael
1999-01-01
.... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
A logistic regression estimating function for spatial Gibbs point processes
DEFF Research Database (Denmark)
Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege
We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related to the p......We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...
Purposeful selection of variables in logistic regression
Directory of Open Access Journals (Sweden)
Williams David Keith
2008-12-01
Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
Gaussian Process Regression Model in Spatial Logistic Regression
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
Spatial correlation in Bayesian logistic regression with misclassification
DEFF Research Database (Denmark)
Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose
2014-01-01
Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Tripepi, Giovanni; Jager, Kitty J.; Stel, Vianda S.; Dekker, Friedo W.; Zoccali, Carmine
2011-01-01
Because of some limitations of stratification methods, epidemiologists frequently use multiple linear and logistic regression analyses to address specific epidemiological questions. If the dependent variable is a continuous one (for example, systolic pressure and serum creatinine), the researcher
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy
DEFF Research Database (Denmark)
Merlo, Juan; Wagner, Philippe; Ghith, Nermin
2016-01-01
BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting...
Two-factor logistic regression in pediatric liver transplantation
Uzunova, Yordanka; Prodanova, Krasimira; Spasov, Lyubomir
2017-12-01
Using a two-factor logistic regression analysis an estimate is derived for the probability of absence of infections in the early postoperative period after pediatric liver transplantation. The influence of both the bilirubin level and the international normalized ratio of prothrombin time of blood coagulation at the 5th postoperative day is studied.
Score Normalization using Logistic Regression with Expected Parameters
Aly, Robin
State-of-the-art score normalization methods use generative models that rely on sometimes unrealistic assumptions. We propose a novel parameter estimation method for score normalization based on logistic regression. Experiments on the Gov2 and CluewebA collection indicate that our method is
A binary logistic regression model with complex sampling design of ...
African Journals Online (AJOL)
2017-09-03
Sep 3, 2017 ... Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. .... Data was entered into STATA-12 and analyzed using. SPSS-21. .... lack of access/too far or costs too much. 35. 1.2.
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Directory of Open Access Journals (Sweden)
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Parameter Estimation for Improving Association Indicators in Binary Logistic Regression
Directory of Open Access Journals (Sweden)
Mahdi Bashiri
2012-02-01
Full Text Available The aim of this paper is estimation of Binary logistic regression parameters for maximizing the log-likelihood function with improved association indicators. In this paper the parameter estimation steps have been explained and then measures of association have been introduced and their calculations have been analyzed. Moreover a new related indicators based on membership degree level have been expressed. Indeed association measures demonstrate the number of success responses occurred in front of failure in certain number of Bernoulli independent experiments. In parameter estimation, existing indicators values is not sensitive to the parameter values, whereas the proposed indicators are sensitive to the estimated parameters during the iterative procedure. Therefore, proposing a new association indicator of binary logistic regression with more sensitivity to the estimated parameters in maximizing the log- likelihood in iterative procedure is innovation of this study.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
On-line mixture-based alternative to logistic regression
Czech Academy of Sciences Publication Activity Database
Nagy, Ivan; Suzdaleva, Evgenia
2016-01-01
Roč. 26, č. 5 (2016), s. 417-437 ISSN 1210-0552 R&D Projects: GA ČR GA15-03564S Institutional support: RVO:67985556 Keywords : on-line modeling * on-line logistic regression * recursive mixture estimation * data dependent pointer Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.394, year: 2016 http://library.utia.cas.cz/separaty/2016/ZS/suzdaleva-0464463.pdf
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Predicting company growth using logistic regression and neural networks
Directory of Open Access Journals (Sweden)
Marijana Zekić-Sušac
2016-12-01
Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
Classifying machinery condition using oil samples and binary logistic regression
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Franco Monsreal, José; Tun Cobos, Miriam Del Ruby; Hernández Gómez, José Ricardo; Serralta Peraza, Lidia Esther Del Socorro
2018-01-17
Low birth weight has been an enigma for science over time. There have been many researches on its causes and its effects. Low birth weight is an indicator that predicts the probability of a child surviving. In fact, there is an exponential relationship between weight deficit, gestational age, and perinatal mortality. Multiple logistic regression is one of the most expressive and versatile statistical instruments available for the analysis of data in both clinical and epidemiology settings, as well as in public health. To assess in a multivariate fashion the importance of 17 independent variables in low birth weight (dependent variable) of children born in the Mayan municipality of José María Morelos, Quintana Roo, Mexico. Analytical observational epidemiological cohort study with retrospective temporality. Births that met the inclusion criteria occurred in the "Hospital Integral Jose Maria Morelos" of the Ministry of Health corresponding to the Maya municipality of Jose Maria Morelos during the period from August 1, 2014 to July 31, 2015. The total number of newborns recorded was 1,147; 84 of which (7.32%) had low birth weight. To estimate the independent association between the explanatory variables (potential risk factors) and the response variable, a multiple logistic regression analysis was performed using the IBM SPSS Statistics 22 software. In ascending numerical order values of odds ratio > 1 indicated the positive contribution of explanatory variables or possible risk factors: "unmarried" marital status (1.076, 95% confidence interval: 0.550 to 2.104); age at menarche ≤ 12 years (1.08, 95% confidence interval: 0.64 to 1.84); history of abortion(s) (1.14, 95% confidence interval: 0.44 to 2.93); maternal weight < 50 kg (1.51, 95% confidence interval: 0.83 to 2.76); number of prenatal consultations ≤ 5 (1.86, 95% confidence interval: 0.94 to 3.66); maternal age ≥ 36 years (3.5, 95% confidence interval: 0.40 to 30.47); maternal age ≤ 19 years (3
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Detecting nonsense for Chinese comments based on logistic regression
Zhuolin, Ren; Guang, Chen; Shu, Chen
2016-07-01
To understand cyber citizens' opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.
MENENTUKAN PROBABILITAS QUALITAS LULUSAN PROGRAM STUDI MENGGUNAKAN LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Maxsi Ary
2016-03-01
Full Text Available Abstract – Human resources (HR is one of the success factors in the economic field, namely how to create a human resources (HR qualified and have the skills and highly competitive in the global competition. Educational level of the labor force that is still relatively low. The structure of education of the workforce is still dominated Indonesian basic education which is about 63.2%. The issue raised is to determine the probability of a program of study (whether or not to see some of the ratio of the number of graduates by the number of students per class, the amount of quota size class (large or small using logistic regression models. Data were obtained from a search result based on the amount of data the study program students and graduates in 2010 Data processing using SPSS. The results of the analysis by assessing model fit and the results will be given for each model fit. Starting with the hypothesis for assessing model fit, statistical -2LogL, Cox and Snell's R Square, Hosmer and Lemeshow's Goodness of Fit Test, and the classification table. The results of the analysis using SPSS as a tool aimed at measuring quality of graduate courses at a university, college, or academy, whether or not based on the ratio of the number of graduates and class quotas. Keywords: Quota Class, Probability, Logistic Regression Abstrak – Sumberdaya manusia (SDM adalah salah satu faktor kesuksesan dalam bidang ekonomi, yaitu bagaimana menciptakan sumber daya manusia (SDM yang berkualitas dan memiliki keterampilan serta berdaya saing tinggi dalam persaingan global. Tingkat pendidikan angkatan kerja yang ada masih relatif rendah. Struktur pendidikan angkatan kerja Indonesia masih didominasi pendidikan dasar yaitu sekitar 63,2%. Persoalan yang dikemukakan adalah menentukan probabilitas sebuah program studi (baik atau tidak dengan melihat beberapa rasio jumlah lulusan dengan jumlah mahasiswa per angkatan, ukuran besarnya kuota kelas (besar atau kecil menggunakan
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Classification of mislabelled microarrays using robust sparse logistic regression.
Bootkrajang, Jakramate; Kabán, Ata
2013-04-01
Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
The intermediate endpoint effect in logistic and probit regression
MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM
2010-01-01
Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...... the age-dependent or antagonistic pleiotropic effects. The models are applied to HFE genotype data to assess the effects on human longevity by different alleles and to detect if an age-dependent effect exists. Application has shown that these methods can serve as useful tools in searching for important...
Performance of a New Restricted Biased Estimator in Logistic Regression
Directory of Open Access Journals (Sweden)
Yasin ASAR
2017-12-01
Full Text Available It is known that the variance of the maximum likelihood estimator (MLE inflates when the explanatory variables are correlated. This situation is called the multicollinearity problem. As a result, the estimations of the model may not be trustful. Therefore, this paper introduces a new restricted estimator (RLTE that may be applied to get rid of the multicollinearity when the parameters lie in some linear subspace in logistic regression. The mean squared errors (MSE and the matrix mean squared errors (MMSE of the estimators considered in this paper are given. A Monte Carlo experiment is designed to evaluate the performances of the proposed estimator, the restricted MLE (RMLE, MLE and Liu-type estimator (LTE. The criterion of performance is chosen to be MSE. Moreover, a real data example is presented. According to the results, proposed estimator has better performance than MLE, RMLE and LTE.
Forecast Model of Urban Stagnant Water Based on Logistic Regression
Directory of Open Access Journals (Sweden)
Liu Pan
2017-01-01
Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.
Logistic Regression in the Identification of Hazards in Construction
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
2017-07-01
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression.
Houpt, Joseph W; Bittner, Jennifer L
2018-05-10
Ideal observer analysis is a fundamental tool used widely in vision science for analyzing the efficiency with which a cognitive or perceptual system uses available information. The performance of an ideal observer provides a formal measure of the amount of information in a given experiment. The ratio of human to ideal performance is then used to compute efficiency, a construct that can be directly compared across experimental conditions while controlling for the differences due to the stimuli and/or task specific demands. In previous research using ideal observer analysis, the effects of varying experimental conditions on efficiency have been tested using ANOVAs and pairwise comparisons. In this work, we present a model that combines Bayesian estimates of psychometric functions with hierarchical logistic regression for inference about both unadjusted human performance metrics and efficiencies. Our approach improves upon the existing methods by constraining the statistical analysis using a standard model connecting stimulus intensity to human observer accuracy and by accounting for variability in the estimates of human and ideal observer performance scores. This allows for both individual and group level inferences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Efficient logistic regression designs under an imperfect population identifier.
Albert, Paul S; Liu, Aiyi; Nansel, Tonja
2014-03-01
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial. © 2013, The International Biometric Society.
Logistic regression model for detecting radon prone areas in Ireland.
Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S
2017-12-01
A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (Plogistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,
This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
A Comparative Study of Cox Regression vs. Log-Logistic ...
African Journals Online (AJOL)
Colorectal cancer is common and lethal disease with different incidence rate in different parts of the world which is taken into account as the third cause of cancer-related deaths. In the present study, using non-parametric Cox model and parametric Log-logistic model, factors influencing survival of patients with colorectal ...
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
2017-12-01
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Computing multiple-output regression quantile regions
Czech Academy of Sciences Publication Activity Database
Paindaveine, D.; Šiman, Miroslav
2012-01-01
Roč. 56, č. 4 (2012), s. 840-853 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf
The M Word: Multicollinearity in Multiple Regression.
Morrow-Howell, Nancy
1994-01-01
Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…
Multiple Linear Regression: A Realistic Reflector.
Nutt, A. T.; Batsell, R. R.
Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…
A secure distributed logistic regression protocol for the detection of rare adverse drug events.
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-05-01
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
What Are the Odds of that? A Primer on Understanding Logistic Regression
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
John Hogland; Nedret Billor; Nathaniel Anderson
2013-01-01
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
On directional multiple-output quantile regression
Czech Academy of Sciences Publication Activity Database
Paindaveine, D.; Šiman, Miroslav
2011-01-01
Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf
Variable selection in Logistic regression model with genetic algorithm.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
2018-02-01
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
A logistic regression model for Ghana National Health Insurance claims
Directory of Open Access Journals (Sweden)
Samuel Antwi
2013-07-01
Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Mapping of the DLQI scores to EQ-5D utility values using ordinal logistic regression.
Ali, Faraz Mahmood; Kay, Richard; Finlay, Andrew Y; Piguet, Vincent; Kupfer, Joerg; Dalgard, Florence; Salek, M Sam
2017-11-01
The Dermatology Life Quality Index (DLQI) and the European Quality of Life-5 Dimension (EQ-5D) are separate measures that may be used to gather health-related quality of life (HRQoL) information from patients. The EQ-5D is a generic measure from which health utility estimates can be derived, whereas the DLQI is a specialty-specific measure to assess HRQoL. To reduce the burden of multiple measures being administered and to enable a more disease-specific calculation of health utility estimates, we explored an established mathematical technique known as ordinal logistic regression (OLR) to develop an appropriate model to map DLQI data to EQ-5D-based health utility estimates. Retrospective data from 4010 patients were randomly divided five times into two groups for the derivation and testing of the mapping model. Split-half cross-validation was utilized resulting in a total of ten ordinal logistic regression models for each of the five EQ-5D dimensions against age, sex, and all ten items of the DLQI. Using Monte Carlo simulation, predicted health utility estimates were derived and compared against those observed. This method was repeated for both OLR and a previously tested mapping methodology based on linear regression. The model was shown to be highly predictive and its repeated fitting demonstrated a stable model using OLR as well as linear regression. The mean differences between OLR-predicted health utility estimates and observed health utility estimates ranged from 0.0024 to 0.0239 across the ten modeling exercises, with an average overall difference of 0.0120 (a 1.6% underestimate, not of clinical importance). This modeling framework developed in this study will enable researchers to calculate EQ-5D health utility estimates from a specialty-specific study population, reducing patient and economic burden.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
National Research Council Canada - National Science Library
Pfleiderer, Elaine M; Scroggins, Cheryl L; Manning, Carol A
2009-01-01
Two separate logistic regression analyses were conducted for low- and high-altitude sectors to determine whether a set of dynamic sector characteristics variables could reliably discriminate between operational error (OE...
Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression
Dineva, S.; Prodanova, K.; Mlachkova, D.
2013-12-01
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Bergtold, Jason S.; Yeager, Elizabeth A.; Featherstone, Allen M.
2011-01-01
The logistic regression models has been widely used in the social and natural sciences and results from studies using this model can have significant impact. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this study is to examine the impact of sample size on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model. A numbe...
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis
Directory of Open Access Journals (Sweden)
Amatria Mario
2016-12-01
Full Text Available Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7 with the 8-a-side format (F-8 in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07 and the Creation Sector-Opponent's Half (0.18 vs 0.16. The probability was the same (0.04 in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20, and these were also more likely to be successful in this format (0.28 vs 0.19.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
Directory of Open Access Journals (Sweden)
Nataša Šarlija
2017-01-01
Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.
The study of logistic regression of risk factor on the death cause of uranium miners
International Nuclear Information System (INIS)
Wen Jinai; Yuan Liyun; Jiang Ruyi
1999-01-01
Logistic regression model has widely been used in the field of medicine. The computer software on this model is popular, but it is worth to discuss how to use this model correctly. Using SPSS (Statistical Package for the Social Science) software, unconditional logistic regression method was adopted to carry out multi-factor analyses on the cause of total death, cancer death and lung cancer death of uranium miners. The data is from radioepidemiological database of one uranium mine. The result show that attained age is a risk factor in the logistic regression analyses of total death, cancer death and lung cancer death. In the logistic regression analysis of cancer death, there is a negative correlation between the age of exposure and cancer death. This shows that the younger the age at exposure, the bigger the risk of cancer death. In the logistic regression analysis of lung cancer death, there is a positive correlation between the cumulated exposure and lung cancer death, this show that cumulated exposure is a most important risk factor of lung cancer death on uranium miners. It has been documented by many foreign reports that the lung cancer death rate is higher in uranium miners
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Directory of Open Access Journals (Sweden)
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
Directory of Open Access Journals (Sweden)
Maarten van Smeden
2016-11-01
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Keith, Timothy Z
2014-01-01
Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Suppression Situations in Multiple Linear Regression
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
A Predictive Logistic Regression Model of World Conflict Using Open Source Data
2015-03-26
No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not
Sample size calculation to externally validate scoring systems based on logistic regression models.
Directory of Open Access Journals (Sweden)
Antonio Palazón-Bru
Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.
de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Directory of Open Access Journals (Sweden)
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Courtney, Jon R.; Prophet, Retta
2011-01-01
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
The use of logistic regression in modelling the distributions of bird ...
African Journals Online (AJOL)
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from bird atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
Czech Academy of Sciences Publication Activity Database
Valenta, Zdeněk; Pitha, J.; Poledne, R.
2006-01-01
Roč. 25, č. 24 (2006), s. 4227-4234 ISSN 0277-6715 R&D Projects: GA MZd NA7512 Institutional research plan: CEZ:AV0Z10300504 Keywords : proportional odds logistic regression * dichotomized outcomes * uncertainty Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.737, year: 2006
International Nuclear Information System (INIS)
Arana, E.; Marti-Bonmati, L.; Bautista, D.; Paredes, R.
1998-01-01
To study the utility of logistic regression and the neuronal network in the diagnosis of cranial hemangiomas. Fifteen patients presenting hemangiomas were selected form a total of 167 patients with cranial lesions. All were evaluated by plain radiography and computed tomography (CT). Nineteen variables in their medical records were reviewed. Logistic regression and neuronal network models were constructed and validated by the jackknife (leave-one-out) approach. The yields of the two models were compared by means of ROC curves, using the area under the curve as parameter. Seven men and 8 women presented hemangiomas. The mean age of these patients was 38.4 (15.4 years (mea ± standard deviation). Logistic regression identified as significant variables the shape, soft tissue mass and periosteal reaction. The neuronal network lent more importance to the existence of ossified matrix, ruptured cortical vein and the mixed calcified-blastic (trabeculated) pattern. The neuronal network showed a greater yield than logistic regression (Az, 0.9409) (0.004 versus 0.7211± 0.075; p<0.001). The neuronal network discloses hidden interactions among the variables, providing a higher yield in the characterization of cranial hemangiomas and constituting a medical diagnostic acid. (Author)29 refs
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
DEFF Research Database (Denmark)
Petersen, Jørgen Holm
2016-01-01
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Directory of Open Access Journals (Sweden)
Yuanyuan Yu
2017-12-01
Full Text Available Abstract Background Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Methods Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM were compared. The “do-calculus” was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Results Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Use of Logistic Regression for Forecasting Short-Term Volcanic Activity
Directory of Open Access Journals (Sweden)
Mark T. Woods
2012-08-01
Full Text Available An algorithm that forecasts volcanic activity using an event tree decision making framework and logistic regression has been developed, characterized, and validated. The suite of empirical models that drive the system were derived from a sparse and geographically diverse dataset comprised of source modeling results, volcano monitoring data, and historic information from analog volcanoes. Bootstrapping techniques were applied to the training dataset to allow for the estimation of robust logistic model coefficients. Probabilities generated from the logistic models increase with positive modeling results, escalating seismicity, and rising eruption frequency. Cross validation yielded a series of receiver operating characteristic curves with areas ranging between 0.78 and 0.81, indicating that the algorithm has good forecasting capabilities. Our results suggest that the logistic models are highly transportable and can compete with, and in some cases outperform, non-transportable empirical models trained with site specific information.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Fuzzy multiple linear regression: A computational approach
Juang, C. H.; Huang, X. H.; Fleming, J. W.
1992-01-01
This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Directory of Open Access Journals (Sweden)
Suduan Chen
2014-01-01
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Use of generalized ordered logistic regression for the analysis of multidrug resistance data.
Agga, Getahun E; Scott, H Morgan
2015-10-01
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.
A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant
Directory of Open Access Journals (Sweden)
Baofeng Shi
2015-01-01
Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Geroukis, Asterios; Brorson, Erik
2014-01-01
In this study, we compare the two statistical techniques logistic regression and discriminant analysis to see how well they classify companies based on clusters – made from the solvency ratio – using principal components as independent variables. The principal components are made with different financial ratios. We use cluster analysis to find groups with low, medium and high solvency ratio of 1200 different companies found on the NASDAQ stock market and use this as an apriori definition of ...
DEFF Research Database (Denmark)
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K
2010-01-01
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....
Nobuoki, Eshima; Minoru, Tabata; Geng, Zhi; Department of Medical Information Analysis, Faculty of Medicine, Oita Medical University; Department of Applied Mathematics, Faculty of Engineering, Kobe University; Department of Probability and Statistics, Peking University
2001-01-01
This paper discusses path analysis of categorical variables with logistic regression models. The total, direct and indirect effects in fully recursive causal systems are considered by using model parameters. These effects can be explained in terms of log odds ratios, uncertainty differences, and an inner product of explanatory variables and a response variable. A study on food choice of alligators as a numerical exampleis reanalysed to illustrate the present approach.
Assessing the performance of variational methods for mixed logistic regression models
Czech Academy of Sciences Publication Activity Database
Rijmen, F.; Vomlel, Jiří
2008-01-01
Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Ertas, Gokhan
2018-07-01
To assess the value of joint evaluation of diffusion tensor imaging (DTI) measures by using logistic regression modelling to detect high GS risk group prostate tumors. Fifty tumors imaged using DTI on a 3 T MRI device were analyzed. Regions of interests focusing on the center of tumor foci and noncancerous tissue on the maps of mean diffusivity (MD) and fractional anisotropy (FA) were used to extract the minimum, the maximum and the mean measures. Measure ratio was computed by dividing tumor measure by noncancerous tissue measure. Logistic regression models were fitted for all possible pair combinations of the measures using 5-fold cross validation. Systematic differences are present for all MD measures and also for all FA measures in distinguishing the high risk tumors [GS ≥ 7(4 + 3)] from the low risk tumors [GS ≤ 7(3 + 4)] (P Logistic regression modelling provides a favorable solution for the joint evaluations easily adoptable in clinical practice. Copyright © 2018 Elsevier Inc. All rights reserved.
International Nuclear Information System (INIS)
Ping, G.
2007-01-01
Full text: Objective: To assess the diagnostic value of CEA CA199 and CA50 for colorectal neoplasm by logistic regression and ROC curve. Methods: The subjects include 75 patients of colorectal cancer, 35 patients of benign intestinal disease and 49 health controls. CEA CA199 and CA50 are measured by CLIA ECLIA and IRMA respectively. The area under the curve (AUC) of CEA CA 199 CA50 and logistic regression results are compared. [Result] In the cancer-benign group, the AUC of CA50 is larger than the AUC of CA199 Compared with the AUC of combination of CEA CA199 and CA50 (0.604),the AUC of combination of CEA and CA50 (0.875) is larger and it is also larger than any other AUC of CEA CA199 or CA50 alone. In the cancerhealth group, the AUC of combination of CEA CA199 and CA50 is larger than any other AUC of CEA CA199 or CA50 alone. No matter in the cancer-benign group or cancerhealth group. The AUC of CEA is larger than the AUC of CA199 or CA50. Conclusion: CEA is useful in the diagnosis of colorectal cancer. In the process of differential diagnosis, the combination of CEA and CA50 can give more information, while the combination of three tumor markers does not perform well. Furthermore, as a statistical method, logistic regression can improve the diagnostic sensitivity and specificity. (author)
Directory of Open Access Journals (Sweden)
Ebrahim Karimi Sangchini
2015-01-01
Full Text Available Landslides are amongst the most damaging natural hazards in mountainous regions. Every year, hundreds of people all over the world lose their lives in landslides; furthermore, there are large impacts on the local and global economy from these events. In this study, landslide hazard zonation in Babaheydar watershed using logistic regression was conducted to determine landslide hazard areas. At first, the landslide inventory map was prepared using aerial photograph interpretations and field surveys. The next step, ten landslide conditioning factors such as altitude, slope percentage, slope aspect, lithology, distance from faults, rivers, settlement and roads, land use, and precipitation were chosen as effective factors on landsliding in the study area. Subsequently, landslide susceptibility map was constructed using the logistic regression model in Geographic Information System (GIS. The ROC and Pseudo-R2 indexes were used for model assessment. Results showed that the logistic regression model provided slightly high prediction accuracy of landslide susceptibility maps in the Babaheydar Watershed with ROC equal to 0.876. Furthermore, the results revealed that about 44% of the watershed areas were located in high and very high hazard classes. The resultant landslide susceptibility maps can be useful in appropriate watershed management practices and for sustainable development in the region.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Landslide susceptibility mapping on a global scale using the method of logistic regression
Directory of Open Access Journals (Sweden)
L. Lin
2017-08-01
Full Text Available This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
Buonaccorsi, John P; Romeo, Giovanni; Thoresen, Magne
2018-03-01
When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods. © 2017, The International Biometric Society.
LOGISTIC REGRESSION AS A TOOL FOR DETERMINATION OF THE PROBABILITY OF DEFAULT FOR ENTERPRISES
Directory of Open Access Journals (Sweden)
Erika SPUCHLAKOVA
2017-12-01
Full Text Available In a rapidly changing world it is necessary to adapt to new conditions. From a day to day approaches can vary. For the proper management of the company it is essential to know the financial situation. Assessment of the company financial health can be carried out by financial analysis which provides a number of methods how to evaluate the company financial health. Analysis indicators are often included in the company assessment, in obtaining bank loans and other financial resources to ensure the functioning of the company. As company focuses on the future and its planning, it is essential to forecast the future financial situation. According to the results of company´s financial health prediction, the company decides on the extension or limitation of its business. It depends mainly on the capabilities of company´s management how they will use information obtained from financial analysis in practice. The findings of logistic regression methods were published firstly in the 60s, as an alternative to the least squares method. The essence of logistic regression is to determine the relationship between being explained (dependent variable and explanatory (independent variables. The basic principle of this static method is based on the regression analysis, but unlike linear regression, it can predict the probability of a phenomenon that has occurred or not. The aim of this paper is to determine the probability of bankruptcy enterprises.
Directory of Open Access Journals (Sweden)
Soyoung Park
2017-07-01
Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
A test for the parameters of multiple linear regression models ...
African Journals Online (AJOL)
A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...
Analysis of sparse data in logistic regression in medical research: A newer approach
Directory of Open Access Journals (Sweden)
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Directory of Open Access Journals (Sweden)
Kritski Afrânio
2006-02-01
Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
[Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
2015-01-01
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods
Directory of Open Access Journals (Sweden)
Dick Apronti
2016-12-01
Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei
2008-10-01
Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.
Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun
2018-03-01
Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.
Integrating classification trees with local logistic regression in Intensive Care prognosis.
Abu-Hanna, Ameen; de Keizer, Nicolette
2003-01-01
Health care effectiveness and efficiency are under constant scrutiny especially when treatment is quite costly as in the Intensive Care (IC). Currently there are various international quality of care programs for the evaluation of IC. At the heart of such quality of care programs lie prognostic models whose prediction of patient mortality can be used as a norm to which actual mortality is compared. The current generation of prognostic models in IC are statistical parametric models based on logistic regression. Given a description of a patient at admission, these models predict the probability of his or her survival. Typically, this patient description relies on an aggregate variable, called a score, that quantifies the severity of illness of the patient. The use of a parametric model and an aggregate score form adequate means to develop models when data is relatively scarce but it introduces the risk of bias. This paper motivates and suggests a method for studying and improving the performance behavior of current state-of-the-art IC prognostic models. Our method is based on machine learning and statistical ideas and relies on exploiting information that underlies a score variable. In particular, this underlying information is used to construct a classification tree whose nodes denote patient sub-populations. For these sub-populations, local models, most notably logistic regression ones, are developed using only the total score variable. We compare the performance of this hybrid model to that of a traditional global logistic regression model. We show that the hybrid model not only provides more insight into the data but also has a better performance. We pay special attention to the precision aspect of model performance and argue why precision is more important than discrimination ability.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Estimating the causes of traffic accidents using logistic regression and discriminant analysis.
Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu
2014-01-01
Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.
DEFF Research Database (Denmark)
Jensen, Signe Marie; Hauger, Hanne; Ritz, Christian
2018-01-01
Mediation analysis is often based on fitting two models, one including and another excluding a potential mediator, and subsequently quantify the mediated effects by combining parameter estimates from these two models. Standard errors of such derived parameters may be approximated using the delta...... method. For a study evaluating a treatment effect on visual acuity, a binary outcome, we demonstrate how mediation analysis may conveniently be carried out by means of marginally fitted logistic regression models in combination with the delta method. Several metrics of mediation are estimated and results...
DEFF Research Database (Denmark)
Pedersen, Bjørn Panella; Ifrim, Georgiana; Liboriussen, Poul
2014-01-01
Abstract Background Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well...... problem. Results Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known...... for further biochemical characterization and structural analysis....
Lin, Y.P.; Chu, H.J.; Wu, C.F.; Verburg, P.H.
2011-01-01
The objective of this study is to compare the abilities of logistic, auto-logistic and artificial neural network (ANN) models for quantifying the relationships between land uses and their drivers. In addition, the application of the results obtained by the three techniques is tested in a dynamic
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis
Chang, C. H.; Chan, H. C.; Chen, B. A.
2016-12-01
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
Energy Technology Data Exchange (ETDEWEB)
Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.
2017-11-01
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Tse, Samson; Davidson, Larry; Chung, Ka-Fai; Yu, Chong Ho; Ng, King Lam; Tsoi, Emily
2015-02-01
More mental health services are adopting the recovery paradigm. This study adds to prior research by (a) using measures of stages of recovery and elements of recovery that were designed and validated in a non-Western, Chinese culture and (b) testing which demographic factors predict advanced recovery and whether placing importance on certain elements predicts advanced recovery. We examined recovery and factors associated with recovery among 75 Hong Kong adults who were diagnosed with schizophrenia and assessed to be in clinical remission. Data were collected on socio-demographic factors, recovery stages and elements associated with recovery. Logistic regression analysis was used to identify variables that could best predict stages of recovery. Receiver operating characteristic curves were used to detect the classification accuracy of the model (i.e. rates of correct classification of stages of recovery). Logistic regression results indicated that stages of recovery could be distinguished with reasonable accuracy for Stage 3 ('living with disability', classification accuracy = 75.45%) and Stage 4 ('living beyond disability', classification accuracy = 75.50%). However, there was no sufficient information to predict Combined Stages 1 and 2 ('overwhelmed by disability' and 'struggling with disability'). It was found that having a meaningful role and age were the most important differentiators of recovery stage. Preliminary findings suggest that adopting salient life roles personally is important to recovery and that this component should be incorporated into mental health services. © The Author(s) 2014.
Hill, Benjamin David; Womble, Melissa N; Rohling, Martin L
2015-01-01
This study utilized logistic regression to determine whether performance patterns on Concussion Vital Signs (CVS) could differentiate known groups with either genuine or feigned performance. For the embedded measure development group (n = 174), clinical patients and undergraduate students categorized as feigning obtained significantly lower scores on the overall test battery mean for the CVS, Shipley-2 composite score, and California Verbal Learning Test-Second Edition subtests than did genuinely performing individuals. The final full model of 3 predictor variables (Verbal Memory immediate hits, Verbal Memory immediate correct passes, and Stroop Test complex reaction time correct) was significant and correctly classified individuals in their known group 83% of the time (sensitivity = .65; specificity = .97) in a mixed sample of young-adult clinical cases and simulators. The CVS logistic regression function was applied to a separate undergraduate college group (n = 378) that was asked to perform genuinely and identified 5% as having possibly feigned performance indicating a low false-positive rate. The failure rate was 11% and 16% at baseline cognitive testing in samples of high school and college athletes, respectively. These findings have particular relevance given the increasing use of computerized test batteries for baseline cognitive testing and return-to-play decisions after concussion.
International Nuclear Information System (INIS)
Yamashita, Y.; Hatanaka, Y.; Torashima, M.; Takahashi, M.; Miyazaki, K.; Okamura, H.
1997-01-01
Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p≤0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.)
Logistic Regression and Path Analysis Method to Analyze Factors influencing Students’ Achievement
Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.
2018-04-01
Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).
Directory of Open Access Journals (Sweden)
SASSAN MOHAMMADY
2013-01-01
Full Text Available Cities have shown remarkable growth due to attraction, economic, social and facilities centralization in the past few decades. Population and urban expansion especially in developing countries, led to lack of resources, land use change from appropriate agricultural land to urban land use and marginalization. Under these circumstances, land use activity is a major issue and challenge for town and country planners. Different approaches have been attempted in urban expansion modelling. Artificial Neural network (ANN models are among knowledge-based models which have been used for urban growth modelling. ANNs are powerful tools that use a machine learning approach to quantify and model complex behaviour and patterns. In this research, ANN and logistic regression have been employed for interpreting urban growth modelling. Our case study is Sanandaj city and we used Landsat TM and ETM+ imageries acquired at 2000 and 2006. The dataset used includes distance to main roads, distance to the residence region, elevation, slope, and distance to green space. Percent Area Match (PAM obtained from modelling of these changes with ANN is equal to 90.47% and the accuracy achieved for urban growth modelling with Logistic Regression (LR is equal to 88.91%. Percent Correct Match (PCM and Figure of Merit for ANN method were 91.33% and 59.07% and then for LR were 90.84% and 57.07%, respectively.
Directory of Open Access Journals (Sweden)
Bita Najafian
2015-02-01
Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS.Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version.Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method.Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Directory of Open Access Journals (Sweden)
Bita Najafian
2015-02-01
Full Text Available Background:Respiratory Distress syndrome is the most common respiratory disease in premature neonate and the most important cause of death among them. We aimed to investigate factors to predict successful or failure of INSURE method as a therapeutic method of RDS. Methods:In a cohort study,45 neonates with diagnosed RDS and birth weight lower than 1500g were included and they underwent INSURE followed by NCPAP(Nasal Continuous Positive Airway Pressure. The patients were divided into failure or successful groups and factors which can predict success of INSURE were investigated by logistic regression in SPSS 16th version. Results:29 and16 neonates were observed in successful and failure groups, respectively. Birth weight was the only variable with significant difference between two groups (P=0.002. Finally logistic regression test showed that birth weight is only predicting factor for success (P: 0.001, EXP[β]: 0.009, CI [95%]: 1.003-0.014 and mortality (P: 0.029, EXP[β]: 0.993, CI [95%]: 0.987-0.999 of neonates treated with INSURE method. Conclusion:Predicting factors which affect on success rate of INSURE can be useful for treating and reducing charge of neonate with RDS and the birth weight is one of the effective factor on INSURE Success in this study.
Wanvarie, Samkaew; Sathapatayavongs, Boonmee
2007-09-01
The aim of this paper was to assess factors that predict students' performance in the Medical Licensing Examination of Thailand (MLET) Step1 examination. The hypothesis was that demographic factors and academic records would predict the students' performance in the Step1 Licensing Examination. A logistic regression analysis of demographic factors (age, sex and residence) and academic records [high school grade point average (GPA), National University Entrance Examination Score and GPAs of the pre-clinical years] with the MLET Step1 outcome was accomplished using the data of 117 third-year Ramathibodi medical students. Twenty-three (19.7%) students failed the MLET Step1 examination. Stepwise logistic regression analysis showed that the significant predictors of MLET Step1 success/failure were residence background and GPAs of the second and third preclinical years. For students whose sophomore and third-year GPAs increased by an average of 1 point, the odds of passing the MLET Step1 examination increased by a factor of 16.3 and 12.8 respectively. The minimum GPAs for students from urban and rural backgrounds to pass the examination were estimated from the equation (2.35 vs 2.65 from 4.00 scale). Students from rural backgrounds and/or low-grade point averages in their second and third preclinical years of medical school are at risk of failing the MLET Step1 examination. They should be given intensive tutorials during the second and third pre-clinical years.
GIS-based rare events logistic regression for mineral prospectivity mapping
Xiong, Yihui; Zuo, Renguang
2018-02-01
Mineralization is a special type of singularity event, and can be considered as a rare event, because within a specific study area the number of prospective locations (1s) are considerably fewer than the number of non-prospective locations (0s). In this study, GIS-based rare events logistic regression (RELR) was used to map the mineral prospectivity in the southwestern Fujian Province, China. An odds ratio was used to measure the relative importance of the evidence variables with respect to mineralization. The results suggest that formations, granites, and skarn alterations, followed by faults and aeromagnetic anomaly are the most important indicators for the formation of Fe-related mineralization in the study area. The prediction rate and the area under the curve (AUC) values show that areas with higher probability have a strong spatial relationship with the known mineral deposits. Comparing the results with original logistic regression (OLR) demonstrates that the GIS-based RELR performs better than OLR. The prospectivity map obtained in this study benefits the search for skarn Fe-related mineralization in the study area.
Directory of Open Access Journals (Sweden)
M. Saki
2013-03-01
Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.
Wulandari, S. P.; Salamah, M.; Rositawati, A. F. D.
2018-04-01
Food security is the condition where the food fulfilment is managed well for the country till the individual. Indonesia is one of the country which has the commitment to create the food security becomes main priority. However, the food necessity becomes common thing means that it doesn’t care about nutrient standard and the health condition of family member, so in the fulfilment of food necessity also has to consider the disease suffered by the family member, one of them is pulmonary tuberculosa. From that reasons, this research is conducted to know the factors which influence on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya by using binary logistic regression method. The analysis result by using binary logistic regression shows that the variables wife latest education, house density and spacious house ventilation significantly affect on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya, where the wife education level is University/equivalent, the house density is eligible or 8 m2/person and spacious house ventilation 10% of the floor area has the opportunity to become food secure households amounted to 0.911089. While the chance of becoming food insecure households amounted to 0.088911. The model household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya has been conformable, and the overall percentages of those classifications are at 71.8%.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A
2017-01-01
Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.
A multiple regression method for genomewide association studies ...
Indian Academy of Sciences (India)
Bujun Mei
2018-06-07
Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.
231 Using Multiple Regression Analysis in Modelling the Role of ...
African Journals Online (AJOL)
User
of Internal Revenue, Tourism Bureau and hotel records. The multiple regression .... additional guest facilities such as restaurant, a swimming pool or child care and social function ... and provide good quality service to the public. Conclusion.
A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies
Directory of Open Access Journals (Sweden)
Jingyuan Zhao
2012-01-01
Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI.
Dikaios, Nikolaos; Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki; Abd-Alazeez, Mohamed; Kirkham, Alex; Allen, Clare; Ahmed, Hashim; Emberton, Mark; Freeman, Alex; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit
2015-02-01
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. • MRI helps find prostate cancer in the anterior of the gland • Logistic regression models based on mp-MRI can classify prostate cancer • Computers can help confirm cancer in areas doctors are uncertain about.
Directory of Open Access Journals (Sweden)
Yoojeong Seo
2018-01-01
Full Text Available The issue of detecting objects bottoming on the sea floor is significant in various fields including civilian and military areas. The objective of this study is to investigate the logistic regression model to discriminate the target from the clutter and to verify the possibility of applying the model trained by the simulated data generated by the mathematical model to the real experimental data because it is not easy to obtain sufficient data in the underwater field. In the first stage of this study, when the clutter signal energy is so strong that the detection of a target is difficult, the logistic regression model is employed to distinguish the strong clutter signal and the target signal. Previous studies have found that if the clutter energy is larger, false detection occurs even for the various existing detection schemes. For this reason, the discrete Fourier transform (DFT magnitude spectrum of acoustic signals received by active sonar is applied to train the model to distinguish whether the received signal contains a target signal or not. The goodness of fit of the model is verified in terms of receiver operation characteristic (ROC, area under ROC curve (AUC, and classification table. The detection performance of the proposed model is evaluated in terms of detection rate according to target to clutter ratio (TCR. Furthermore, the real experimental data are employed to test the proposed approach. When using the experimental data to test the model, the logistic regression model is trained by the simulated data that are generated based on the mathematical model for the backscattering of the cylindrical object. The mathematical model is developed according to the size of the cylinder used in the experiment. Since the information on the experimental environment including the sound speed, the sediment type and such is not available, once simulated data are generated under various conditions, valid simulated data are selected using 70% of the
Novikov, I; Fund, N; Freedman, L S
2010-01-15
Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.
Non-proportional odds multivariate logistic regression of ordinal family data.
Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C
2015-03-01
Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
AN APPLICATION OF THE LOGISTIC REGRESSION MODEL IN THE EXPERIMENTAL PHYSICAL CHEMISTRY
Directory of Open Access Journals (Sweden)
Elpidio Corral-López
2015-06-01
Full Text Available The calculation of intensive properties molar volumes of ethanol-water mixtures by experimental densities and tangent method in the Physical Chemistry Laboratory presents the problem of making manually the molar volume curve versus mole fraction and the trace of the tangent line trace. The advantage of using a statistical model the Logistic Regression on a Texas VOYAGE graphing calculator allowed trace the curve and the tangents in situ, and also evaluate the students work during the experimental session. The error percentage between the molar volumes calculated using literature data and those obtained with statistical method is minimal, which validates the model. It is advantageous use the calculator with this application as a teaching support tool, reducing the evaluation time of 3 weeks to 3 hours.
Eke, Gemma; Holttum, Sue; Hayward, Mark
2012-03-01
Previous research highlights barriers to clinical psychologists conducting research, but has rarely examined U.K. clinical psychologists. The study investigated U.K. clinical psychologists' self-reported research output and tested part of a theoretical model of factors influencing their intention to conduct research. Questionnaires were mailed to 1,300 U.K. clinical psychologists. Three hundred and seventy-four questionnaires were returned (29% response-rate). This study replicated in a U.K. sample the finding that the modal number of publications was zero, highlighted in a number of U.K. and U.S. studies. Research intention was bimodally distributed, and logistic regression classified 78% of cases successfully. Outcome expectations, perceived behavioral control and normative beliefs mediated between research training environment and intention. Further research should explore how research is negotiated in clinical roles, and this issue should be incorporated into prequalification training. © 2012 Wiley Periodicals, Inc.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Hayes, Andrew F; Matthes, Jörg
2009-08-01
Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction between the independent and moderator variables in a model of the outcome variable. When an interaction is found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a specific pattern of effects of the focal independent variable as a function of the moderator. This article describes the familiar pick-a-point approach and the much less familiar Johnson-Neyman technique for probing interactions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is also available for users who prefer a point-and-click user interface rather than command syntax.
Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María
2009-02-01
Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.
ENHANCED PREDICTION OF STUDENT DROPOUTS USING FUZZY INFERENCE SYSTEM AND LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
A. Saranya
2016-01-01
Full Text Available Predicting college and school dropouts is a major problem in educational system and has complicated challenge due to data imbalance and multi dimensionality, which can affect the low performance of students. In this paper, we have collected different database from various colleges, among these 500 best real attributes are identified in order to identify the factor that affecting dropout students using neural based classification algorithm and different mining technique are implemented for data processing. We also propose a Dropout Prediction Algorithm (DPA using fuzzy logic and Logistic Regression based inference system because the weighted average will improve the performance of whole system. We are experimented our proposed work with all other classification systems and documented as the best outcomes. The aggregated data is given to the decision trees for better dropout prediction. The accuracy of overall system 98.6% it shows the proposed work depicts efficient prediction.
García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.; Díaz, M.
2008-03-01
This work has evaluated the probability of earthquake-triggered landslide occurrence in the whole of El Salvador, with a Geographic Information System (GIS) and a logistic regression model. Slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness are the predictor variables used to determine the dependent variable of occurrence or non-occurrence of landslides within an individual grid cell. The results illustrate the importance of terrain roughness and soil type as key factors within the model — using only these two variables the analysis returned a significance level of 89.4%. The results obtained from the model within the GIS were then used to produce a map of relative landslide susceptibility.
Logistic Regression Analysis on Factors Affecting Adoption of Rice-Fish Farming in North Iran
Directory of Open Access Journals (Sweden)
Seyyed Ali NOORHOSSEINI-NIYAKI
2012-06-01
Full Text Available We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran. We conducted a survey with open-ended questions. Data were collected from 184 respondents (61 adopters and 123 non-adopters randomly sampled from selected villages and analyzed using logistic regression and multi-response analysis. Family size, number of contacts with an extension agent, participation in extension-education activities, membership in social institutions and the presence of farm workers were the most important socio-economic factors for the adoption of rice-fish farming system. In addition, economic problems were the most common issue reported by adopters. Other issues such as lack of access to appropriate fish food, losses of fish, lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.
Directory of Open Access Journals (Sweden)
Farid Djeddaoui
2017-10-01
Full Text Available The main goal of this work was to identify the areas that are most susceptible to desertification in a part of the Algerian steppe, and to quantitatively assess the key factors that contribute to this desertification. In total, 139 desertified zones were mapped using field surveys and photo-interpretation. We selected 16 spectral and geomorphic predictive factors, which a priori play a significant role in desertification. They were mainly derived from Landsat 8 imagery and Shuttle Radar Topographic Mission digital elevation model (SRTM DEM. Some factors, such as the topographic position index (TPI and curvature, were used for the first time in this kind of study. For this purpose, we adapted the logistic regression algorithm for desertification susceptibility mapping, which has been widely used for landslide susceptibility mapping. The logistic model was evaluated using the area under the receiver operating characteristic (ROC curve. The model accuracy was 87.8%. We estimated the model uncertainties using a bootstrap method. Our analysis suggests that the predictive model is robust and stable. Our results indicate that land cover factors, including normalized difference vegetation index (NDVI and rangeland classes, play a major role in determining desertification occurrence, while geomorphological factors have a limited impact. The predictive map shows that 44.57% of the area is classified as highly to very highly susceptible to desertification. The developed approach can be used to assess desertification in areas with similar characteristics and to guide possible actions to combat desertification.
Effective factors contraceptive use by logistic regression model in Tehran, 1996
Directory of Open Access Journals (Sweden)
Ramezani F
1999-07-01
Full Text Available Despite unwillingness to fertility, about 30% of couples do not use any kind of contraception and this will lead to unwanted pregnancy. In this clinical trial study, 4177 subjects who had at least one alive child, and delivered in one of the 12 university hospitals in Tehran were recruited. This study was conducted in 1996. The questionnaire included some questions about contraceptive use, their attitudes about unwantedness or wantedness of their current pregnancies. Data were analysed using a Logistic Regrassion Model. Results showed that 20.3% of those who had no fertility intention, did not use any kind of contraception methods, 41.1% of the subjects who were using a contraception method before pregnancy, had got pregnant unwantedly. Based on Logistic Regression Model; age, education, previous familiarity of women with contraception methods and husband's education were the most significant factors in contraceptive use. Subjects who were 20 years old and less or 35 years old and more and illeterate subjects were at higher risk for unuse of contraception methods. This risk was not related to the gender of their children that suggests a positive change in their perspectives towards sex and the number of children. It is suggested that health politicians choose an appropriate model to enhance the literacy, education and counseling for the correct usage of contraceptives and prevention of unwanted pregnancy.
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI
Energy Technology Data Exchange (ETDEWEB)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
2014-09-17
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI
International Nuclear Information System (INIS)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit; Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki; Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark; Kirkham, Alex; Allen, Clare; Freeman, Alex
2015-01-01
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Directory of Open Access Journals (Sweden)
Land Walker H
2011-01-01
Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Research and analyze of physical health using multiple regression analysis
Directory of Open Access Journals (Sweden)
T. S. Kyi
2014-01-01
Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.
International Nuclear Information System (INIS)
Abdolmaleki, P.; Yarmohammadi, M.; Gity, M.
2004-01-01
Background: We designed an algorithmic model based on regression analysis and a non-algorithmic model based on the Artificial Neural Network. Materials and methods: The ability of these models was compared together in clinical application to differentiate malignant from benign breast tumors in a study group of 161 patient's records. Each patient's record consisted of 6 subjective features extracted from MRI appearance. These findings were enclosed as features extracted for an Artificial Neural Network as well as a logistic regression model to predict biopsy outcome. After both models had been trained perfectly on samples (n=100), the validation samples (n=61) were presented to the trained network as well as the established logistic regression models. Finally, the diagnostic performance of models were compared to the that of the radiologist in terms of sensitivity, specificity and accuracy, using receiver operating characteristic curve analysis. Results: The average out put of the Artificial Neural Network yielded a perfect sensitivity (98%) and high accuracy (90%) similar to that one of an expert radiologist (96% and 92%) while specificity was smaller than that (67%) verses 80%). The output of the logistic regression model using significant features showed improvement in specificity from 60% for the logistic regression model using all features to 93% for the reduced logistic regression model, keeping the accuracy around 90%. Conclusion: Results show that Artificial Neural Network and logistic regression model prove the relationship between extracted morphological features and biopsy results. Using statistically significant variables reduced logistic regression model outperformed of Artificial Neural Network with remarkable specificity while keeping high sensitivity is achieved
Elliptical multiple-output quantile regression and convex optimization
Czech Academy of Sciences Publication Activity Database
Hallin, M.; Šiman, Miroslav
2016-01-01
Roč. 109, č. 1 (2016), s. 232-237 ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.540, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf
Wulifan, Joseph K; Jahn, Albrecht; Hien, Hervé; Ilboudo, Patrick Christian; Meda, Nicolas; Robyn, Paul Jacob; Saidou Hamadou, T; Haidara, Ousmane; De Allegri, Manuela
2017-12-19
Unmet need for family planning has implications for women and their families, such as unsafe abortion, physical abuse, and poor maternal health. Contraceptive knowledge has increased across low-income settings, yet unmet need remains high with little information on the factors explaining it. This study assessed factors associated with unmet need among pregnant women in rural Burkina Faso. We collected data on pregnant women through a population-based survey conducted in 24 rural districts between October 2013 and March 2014. Multivariate multilevel logistic regression was used to assess the association between unmet need for family planning and a selection of relevant demand- and supply-side factors. Of the 1309 pregnant women covered in the survey, 239 (18.26%) reported experiencing unmet need for family planning. Pregnant women with more than three living children [OR = 1.80; 95% CI (1.11-2.91)], those with a child younger than 1 year [OR = 1.75; 95% CI (1.04-2.97)], pregnant women whose partners disapproves contraceptive use [OR = 1.51; 95% CI (1.03-2.21)] and women who desired fewer children compared to their partners preferred number of children [OR = 1.907; 95% CI (1.361-2.672)] were significantly more likely to experience unmet need for family planning, while health staff training in family planning logistics management (OR = 0.46; 95% CI (0.24-0.73)] was associated with a lower probability of experiencing unmet need for family planning. Findings suggest the need to strengthen family planning interventions in Burkina Faso to ensure greater uptake of contraceptive use and thus reduce unmet need for family planning.
DEFF Research Database (Denmark)
Larsen, Klaus; Merlo, Juan
2005-01-01
The logistic regression model is frequently used in epidemiologic studies, yielding odds ratio or relative risk interpretations. Inspired by the theory of linear normal models, the logistic regression model has been extended to allow for correlated responses by introducing random effects. However......, the model does not inherit the interpretational features of the normal model. In this paper, the authors argue that the existing measures are unsatisfactory (and some of them are even improper) when quantifying results from multilevel logistic regression analyses. The authors suggest a measure...... of heterogeneity, the median odds ratio, that quantifies cluster heterogeneity and facilitates a direct comparison between covariate effects and the magnitude of heterogeneity in terms of well-known odds ratios. Quantifying cluster-level covariates in a meaningful way is a challenge in multilevel logistic...
Energy Technology Data Exchange (ETDEWEB)
Huang, W; Tu, S [Chang Gung University, Kwei-shan, Tao-Yuan, Taiwan (China)
2016-06-15
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness of 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
2014-01-01
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
Risk of Recurrence in Operated Parasagittal Meningiomas: A Logistic Binary Regression Model.
Escribano Mesa, José Alberto; Alonso Morillejo, Enrique; Parrón Carreño, Tesifón; Huete Allut, Antonio; Narro Donate, José María; Méndez Román, Paddy; Contreras Jiménez, Ascensión; Pedrero García, Francisco; Masegosa González, José
2018-02-01
Parasagittal meningiomas arise from the arachnoid cells of the angle formed between the superior sagittal sinus (SSS) and the brain convexity. In this retrospective study, we focused on factors that predict early recurrence and recurrence times. We reviewed 125 patients with parasagittal meningiomas operated from 1985 to 2014. We studied the following variables: age, sex, location, laterality, histology, surgeons, invasion of the SSS, Simpson removal grade, follow-up time, angiography, embolization, radiotherapy, recurrence and recurrence time, reoperation, neurologic deficit, degree of dependency, and patient status at the end of follow-up. Patients ranged in age from 26 to 81 years (mean 57.86 years; median 60 years). There were 44 men (35.2%) and 81 women (64.8%). There were 57 patients with neurologic deficits (45.2%). The most common presenting symptom was motor deficit. World Health Organization grade I tumors were identified in 104 patients (84.6%), and the majority were the meningothelial type. Recurrence was detected in 34 cases. Time of recurrence was 9 to 336 months (mean: 84.4 months; median: 79.5 months). Male sex was identified as an independent risk for recurrence with relative risk 2.7 (95% confidence interval 1.21-6.15), P = 0.014. Kaplan-Meier curves for recurrence had statistically significant differences depending on sex, age, histologic type, and World Health Organization histologic grade. A binary logistic regression was made with the Hosmer-Lemeshow test with P > 0.05; sex, tumor size, and histologic type were used in this model. Male sex is an independent risk factor for recurrence that, associated with other factors such tumor size and histologic type, explains 74.5% of all cases in a binary regression model. Copyright © 2017 Elsevier Inc. All rights reserved.
DETERMINATION OF FACTORS AFFECTING LENGTH OF STAY WITH MULTINOMIAL LOGISTIC REGRESSION IN TURKEY
Directory of Open Access Journals (Sweden)
Öğr. Gör. Rukiye NUMAN TEKİN
2016-08-01
Full Text Available Length of stay (LOS has important implications in various aspects of health services, can vary according to a wide range of factors. It is noticed that LOS has been neglected mostly in both theoratical studies and practice of health care management in Turkey. The main purpose of this study is to identify factors related to LOS in Turkey. A retrospective analysis of 2.255.836 patients hospitalized to private, university, foundation university and other (municipality, association and foreigners/minority hospitals hospitals which have an agreement with Social Security Institution (SSI in Turkey, from January 1, 2010, until the December 31, 2010, was examined. Patient’s data were taken from MEDULA (National Electronic Invoice System and SPSS 18.0 was used to perform statistical analysis. In this study t-test, one way anova and multinomial logistic regression are used to determine variables that may affect to LOS. The average LOS of patients was 3,93 days (SD = 5,882. LOS showed a statistically significant difference according to all independent variables used in the study (age, gender, disease class, type of hospitalization, presence of comorbidity, type and number of surgery, season of hospitalization, hospital ownership/bed capacity/ geographical region/residential area/type of service. According to the results of the multinomial lojistic regression analysis, LOS was negatively affected in terms of gender, presence of comorbidity, geographical region of hospital and was positively affected in terms of age, season of hospitalization, hospital bed capacity/ ownership/type of service/residential area.
Modelling landscape change in paddy fields using logistic regression and GIS
Franjaya, E. E.; Syartinilia; Setiawan, Y.
2018-05-01
Paddy field in karawang district, as an important agricultural land in west java, has been decreased since 1994. From previous study, paddy fields dominantly turned into built area. The changes were almost occured in the middle area of the district where roadways, industries, settlements, and commercial buildings were existed. These were estimated as driving forces. But, we still need to prove it. This study aimed to construct the paddy field probability change model, subsequently the driving forces will be obtained. GIS combined with logistic regression using environmental variables were used as main method in this study. Ten environmental variables were elevation 0–500 m, elevation>500 m, slope8%, CBD, build up area, river, irrigation, toll and national roadway, and collector and local roadway. The result indicated that four variables were significantly played as driving forces (slope>8%, CBD area, build up area, and collector and local roadway). Paddy field has high, medium, and low probability to change which covered about 27.8%, 7.8%, and 64.4% area in Karawang respectively. Based on landscape ecology, the recommendation that suitable with landscape change is adaptive management.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression
Directory of Open Access Journals (Sweden)
Alfonso L. Palmer
2010-01-01
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
Demand analysis of flood insurance by using logistic regression model and genetic algorithm
Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.
2018-03-01
Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.
A Logistic Regression Based Auto Insurance Rate-Making Model Designed for the Insurance Rate Reform
Directory of Open Access Journals (Sweden)
Zhengmin Duan
2018-02-01
Full Text Available Using a generalized linear model to determine the claim frequency of auto insurance is a key ingredient in non-life insurance research. Among auto insurance rate-making models, there are very few considering auto types. Therefore, in this paper we are proposing a model that takes auto types into account by making an innovative use of the auto burden index. Based on this model and data from a Chinese insurance company, we built a clustering model that classifies auto insurance rates into three risk levels. The claim frequency and the claim costs are fitted to select a better loss distribution. Then the Logistic Regression model is employed to fit the claim frequency, with the auto burden index considered. Three key findings can be concluded from our study. First, more than 80% of the autos with an auto burden index of 20 or higher belong to the highest risk level. Secondly, the claim frequency is better fitted using the Poisson distribution, however the claim cost is better fitted using the Gamma distribution. Lastly, based on the AIC criterion, the claim frequency is more adequately represented by models that consider the auto burden index than those do not. It is believed that insurance policy recommendations that are based on Generalized linear models (GLM can benefit from our findings.
Ulkhaq, M. M.; Widodo, A. K.; Yulianto, M. F. A.; Widhiyaningrum; Mustikasari, A.; Akshinta, P. Y.
2018-03-01
The implementation of renewable energy in this globalization era is inevitable since the non-renewable energy leads to climate change and global warming; hence, it does harm the environment and human life. However, in the developing countries, such as Indonesia, the implementation of the renewable energy sources does face technical and social problems. For the latter, renewable energy sources implementation is only effective if the public is aware of its benefits. This research tried to identify the determinants that influence consumers’ intention in adopting renewable energy sources. In addition, this research also tried to predict the consumers who are willing to apply the renewable energy sources in their houses using a logistic regression approach. A case study was conducted in Semarang, Indonesia. The result showed that only eight variables (from fifteen) that are significant statistically, i.e., educational background, employment status, income per month, average electricity cost per month, certainty about the efficiency of renewable energy project, relatives’ influence to adopt the renewable energy sources, energy tax deduction, and the condition of the price of the non-renewable energy sources. The finding of this study could be used as a basis for the government to set up a policy towards an implementation of the renewable energy sources.
Logistic regression analysis of financial literacy implications for retirement planning in Croatia
Directory of Open Access Journals (Sweden)
Dajana Barbić
2016-12-01
Full Text Available The relationship between financial literacy and financial behavior is important, as individuals are increasingly being asked to take responsibility for their financial wellbeing, especially their retirement. Analyzing of individual savings and attitudes towards retirement planning is important, as these types of investments are a way of preserving security during years of financial vulnerability. Research indicates that individuals who do not save adequately for their retirement, generally have a relatively low level of financial literacy. This research investigates the relationship between financial literacy and retirement planning in Croatia. To analyze the relationship between financial literacy and planning for retirement, maximum likelihood logistic regression analysis was used. The paper shows that those who answer financial literacy questions correctly are more likely to have a positive attitude towards retirement planning and are more likely to save for retirement, ensuring them of higher levels of financial security in retirement. The Goodness-of-Fit evaluation for the estimated logit model was performed using the Andrews and Hosmer-Lemeshow Tests.
Propensity score matching of the gymnastics for diabetes mellitus using logistic regression
Otok, Bambang Widjanarko; Aisyah, Amalia; Purhadi, Andari, Shofi
2017-12-01
Diabetes Mellitus (DM) is a group of metabolic diseases with characteristics shows an abnormal blood glucose level occurring due to pancreatic insulin deficiency, decreased insulin effectiveness or both. The report from the ministry of health shows that DMs prevalence data of East Java province is 2.1%, while the DMs prevalence of Indonesia is only 1,5%. Given the high cases of DM in East Java, it needs the preventive action to control factors causing the complication of DM. This study aims to determine the combination factors causing the complication of DM to reduce the bias by confounding variables using Propensity Score Matching (PSM) with the method of propensity score estimation is binary logistic regression. The data used in this study is the medical record from As-Shafa clinic consisting of 6 covariates and health complication as response variable. The result of PSM analysis showed that there are 22 of 126 DMs patients attending gymnastics paired with patients who didnt attend to diabetes gymnastics. The Average Treatment of Treated (ATT) estimation results showed that the more patients who didnt attend to gymnastics, the more likely the risk for the patients having DMs complications.
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
2015-11-03
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Use of multilevel logistic regression to identify the causes of differential item functioning.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
2010-11-01
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Lewis, Kristin Nicole; Heckman, Bernadette Davantes; Himawan, Lina
2011-08-01
Growth mixture modeling (GMM) identified latent groups based on treatment outcome trajectories of headache disability measures in patients in headache subspecialty treatment clinics. Using a longitudinal design, 219 patients in headache subspecialty clinics in 4 large cities throughout Ohio provided data on their headache disability at pretreatment and 3 follow-up assessments. GMM identified 3 treatment outcome trajectory groups: (1) patients who initiated treatment with elevated disability levels and who reported statistically significant reductions in headache disability (high-disability improvers; 11%); (2) patients who initiated treatment with elevated disability but who reported no reductions in disability (high-disability nonimprovers; 34%); and (3) patients who initiated treatment with moderate disability and who reported statistically significant reductions in headache disability (moderate-disability improvers; 55%). Based on the final multinomial logistic regression model, a dichotomized treatment appointment attendance variable was a statistically significant predictor for differentiating high-disability improvers from high-disability nonimprovers. Three-fourths of patients who initiated treatment with elevated disability levels did not report reductions in disability after 5 months of treatment with new preventive pharmacotherapies. Preventive headache agents may be most efficacious for patients with moderate levels of disability and for patients with high disability levels who attend all treatment appointments. Copyright © 2011 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Sze, N N; Wong, S C; Lee, C Y
2014-12-01
In past several decades, many countries have set quantified road safety targets to motivate transport authorities to develop systematic road safety strategies and measures and facilitate the achievement of continuous road safety improvement. Studies have been conducted to evaluate the association between the setting of quantified road safety targets and road fatality reduction, in both the short and long run, by comparing road fatalities before and after the implementation of a quantified road safety target. However, not much work has been done to evaluate whether the quantified road safety targets are actually achieved. In this study, we used a binary logistic regression model to examine the factors - including vehicle ownership, fatality rate, and national income, in addition to level of ambition and duration of target - that contribute to a target's success. We analyzed 55 quantified road safety targets set by 29 countries from 1981 to 2009, and the results indicate that targets that are in progress and with lower level of ambitions had a higher likelihood of eventually being achieved. Moreover, possible interaction effects on the association between level of ambition and the likelihood of success are also revealed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Ozgun Akcay
2015-10-01
Full Text Available Unmanned Aerial Systems (UAS are now capable of gathering high-resolution data, therefore, landslides can be explored in detail at larger scales. In this research, 132 aerial photographs were captured, and 85,456 features were detected and matched automatically using UAS photogrammetry. The root mean square (RMS values of the image coordinates of the Ground Control Points (GPCs varied from 0.521 to 2.293 pixels, whereas maximum RMS values of automatically matched features was calculated as 2.921 pixels. Using the 3D point cloud, which was acquired by aerial photogrammetry, the raster datasets of the aspect, slope, and maximally stable extremal regions (MSER detecting visual uniformity, were defined as three variables, in order to reason fissure structures on the landslide surface. In this research, an Adaptive Neuro Fuzzy Inference System (ANFIS and a Logistic Regression (LR were implemented using training datasets to infer fissure data appropriately. The accuracy of the predictive models was evaluated by drawing receiver operating characteristic (ROC curves and by calculating the area under the ROC curve (AUC. The experiments exposed that high-resolution imagery is an indispensable data source to model and validate landslide fissures appropriately.
A joint logistic regression and covariate-adjusted continuous-time Markov chain model.
Rubin, Maria Laura; Chan, Wenyaw; Yamal, Jose-Miguel; Robertson, Claudia Sue
2017-12-10
The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross-sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6-month outcome based on physiological data collected post-injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long-term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
Characterization of breast masses by dynamic enhanced MR imaging. A logistic regression analysis
International Nuclear Information System (INIS)
Ikeda, O.; Morishita, S.; Kido, T.; Kitajima, M.; Yamashita, Y.; Takahashi, M.; Okamura, K.; Fukuda, S.
1999-01-01
Purpose: To identify features useful for differentiation between malignant and benign breast neoplasms using multivariate analysis of findings by MR imaging. Material and Methods: In a retrospective analysis, 61 patients with 64 breast masses underwent MR imaging and the time-signal intensity curves for precontrast dynamic postcontrast images were quantitatively analyzed. Statistical analysis was performed using a logistic regression model, which was prospectively tested in another 34 patients with suspected breast masses. Results: Univariate analysis revealed that the reliable indicators for malignancy were first the appearance of the tumor border, followed by the washout ratio, internal architecture after contrast enhancement, and peak time. The factors significantly associated with malignancy were irregular tumor border, followed by washout ratio, internal architecture, and peak time. For differentiation between benignity and malignancy, the maximum cut-off point was to be found between 0.47 and 0.51. In a prospective application of this model, 91% of the lesions were accurately discriminated as benign or malignant lesions. Conclusion: Combination of contrast-enhanced dynamic and postcontrast-enhanced MR imaging provided accurate data for the diagnosis of malignant neoplasms of the breast. The model had an accuracy of 91% (sensitivity 90%, specificity 93%). (orig.)
Energy Technology Data Exchange (ETDEWEB)
Gomes, Daniel de Souza; Baptista Filho, Benedito; Oliveira, Fabio Branco de, E-mail: dsgomes@ipen.br, E-mail: bdbfilho@ipen.br, E-mail: fabio@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Giovedi, Claudia, E-mail: claudia.giovedi@labrisco.usp.br [Universidade de Sao Paulo (POLI/USP), Sao Paulo, SP (Brazil). Lab. de Analise, Avaliacao e Gerenciamento de Risco
2015-07-01
A reactivity-initiated Accident (RIA) is a disastrous failure, which occurs because of an unexpected rise in the fission rate and reactor power. This sudden increase in the reactor power may activate processes that might lead to the failure of fuel cladding. In severe accidents, a disruption of fuel and core melting can occur. The purpose of the present research is to study the patterns of such accidents using exploratory data analysis techniques. A study based on applied statistics was used for simulations. Then, we chose peak enthalpy, pulse width, burnup, fission gas release, and the oxidation of zirconium as input parameters and set the safety boundary conditions. This new approach includes the logistic regression. With this, the present research aims also to develop the ability to identify the conditions and the probability of failures. Zirconium-based alloys fabricating the cladding of the fuel rod elements with niobium 1% were analyzed for high burnup limits at 65 MWd/kgU. The data based on six decades of investigations from experimental programs. In test, perform in American reactors such as the transient reactor test (TREAT), and power Burst Facility (PBF). In experiments realized in Japanese program at nuclear in the safety research reactor (NSRR), and in Kazakhstan as impulse graphite reactor (IGR). The database obtained from the tests and served as a support for our study. (author)
Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model
Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd
2016-10-01
Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Snedden, Gregg A.; Steyer, Gregory D.
2013-02-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
International Nuclear Information System (INIS)
Gomes, Daniel de Souza; Baptista Filho, Benedito; Oliveira, Fabio Branco de; Giovedi, Claudia
2015-01-01
A reactivity-initiated Accident (RIA) is a disastrous failure, which occurs because of an unexpected rise in the fission rate and reactor power. This sudden increase in the reactor power may activate processes that might lead to the failure of fuel cladding. In severe accidents, a disruption of fuel and core melting can occur. The purpose of the present research is to study the patterns of such accidents using exploratory data analysis techniques. A study based on applied statistics was used for simulations. Then, we chose peak enthalpy, pulse width, burnup, fission gas release, and the oxidation of zirconium as input parameters and set the safety boundary conditions. This new approach includes the logistic regression. With this, the present research aims also to develop the ability to identify the conditions and the probability of failures. Zirconium-based alloys fabricating the cladding of the fuel rod elements with niobium 1% were analyzed for high burnup limits at 65 MWd/kgU. The data based on six decades of investigations from experimental programs. In test, perform in American reactors such as the transient reactor test (TREAT), and power Burst Facility (PBF). In experiments realized in Japanese program at nuclear in the safety research reactor (NSRR), and in Kazakhstan as impulse graphite reactor (IGR). The database obtained from the tests and served as a support for our study. (author)
MULGRES: a computer program for stepwise multiple regression analysis
A. Jeff Martin
1971-01-01
MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.
Using multiple linear regression techniques to quantify carbon ...
African Journals Online (AJOL)
Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Nowicki, M. A.; Hearne, M.; Thompson, E.; Wald, D. J.
2012-12-01
Seismically induced landslides present a costly and often fatal threats in many mountainous regions. Substantial effort has been invested to understand where seismically induced landslides may occur in the future. Both slope-stability methods and, more recently, statistical approaches to the problem are described throughout the literature. Though some regional efforts have succeeded, no uniformly agreed-upon method is available for predicting the likelihood and spatial extent of seismically induced landslides. For use in the U. S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system, we would like to routinely make such estimates, in near-real time, around the globe. Here we use the recently produced USGS ShakeMap Atlas of historic earthquakes to develop an empirical landslide probability model. We focus on recent events, yet include any digitally-mapped landslide inventories for which well-constrained ShakeMaps are also available. We combine these uniform estimates of the input shaking (e.g., peak acceleration and velocity) with broadly available susceptibility proxies, such as topographic slope and surface geology. The resulting database is used to build a predictive model of the probability of landslide occurrence with logistic regression. The landslide database includes observations from the Northridge, California (1994); Wenchuan, China (2008); ChiChi, Taiwan (1999); and Chuetsu, Japan (2004) earthquakes; we also provide ShakeMaps for moderate-sized events without landslide for proper model testing and training. The performance of the regression model is assessed with both statistical goodness-of-fit metrics and a qualitative review of whether or not the model is able to capture the spatial extent of landslides for each event. Part of our goal is to determine which variables can be employed based on globally-available data or proxies, and whether or not modeling results from one region are transferrable to
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Reporting quality of multivariable logistic regression in selected Indian medical journals.
Kumar, R; Indrayan, A; Chhabra, P
2012-01-01
Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.
International Nuclear Information System (INIS)
Nakasone, Yutaka; Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori
2007-01-01
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning
Predictors of work injury in underground mines - an application of a logistic regression model
Energy Technology Data Exchange (ETDEWEB)
P.S. Paul [Indian School of Mines University, Dhanbad (India). Department of Mining Engineering
2009-05-15
Mine accidents and injuries are complex and generally characterized by several factors starting from personal to technical, and technical to social characteristics. In this study, an attempt has been made to identify the various factors responsible for work related injuries in mines and to estimate the risk of work injury to mine workers. The prediction of work injury in mines was done by a step-by-step multivariate logistic regression modeling with an application to case study mines in India. In total, 18 variables were considered in this study. Most of the variables are not directly quantifiable. Instruments were developed to quantify them through a questionnaire type survey. Underground mine workers were randomly selected for the survey. Responses from 300 participants were used for the analysis. Four variables, age, negative affectivity, job dissatisfaction, and physical hazards bear significant discriminating power for risk of injury to the workers, comparing between cases and controls in a multivariate situation while controlling all the personal and socio-technical variables. The analysis reveals that negatively affected workers are 2.54 times more prone to injuries than the less negatively affected workers and this factor is a more important risk factor for the case-study mines. Long term planning through identification of the negative individuals, proper counseling regarding the adverse effects of negative behaviors and special training is urgently required. Care should be taken for the aged and experienced workers in terms of their job responsibility and training requirements. Management should provide a friendly atmosphere during work to increase the confidence of the injury prone miners. 44 refs., 4 tabs.
Energy Technology Data Exchange (ETDEWEB)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim U.; Emberton, Mark [University College London, Research Department of Urology, Division of Surgery and Interventional Science, London (United Kingdom); Kirkham, Alex [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
2015-09-15
To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangeable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. (orig.)
LOGISTIC NETWORK REGRESSION FOR SCALABLE ANALYSIS OF NETWORKS WITH JOINT EDGE/VERTEX DYNAMICS.
Almquist, Zack W; Butts, Carter T
2014-08-01
Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach.
THE ROLE AND PLACE OF LOGISTIC REGRESSION AND ROC ANALYSIS IN SOLVING MEDICAL DIAGNOSTIC TASK
Directory of Open Access Journals (Sweden)
S. G. Grigoryev
2016-01-01
Full Text Available Diagnostics, equally with prevention and treatment, is a basis of medical science and practice. For its history the medicine has accumulated a great variety of diagnostic methods for different diseases and pathologic conditions. Nevertheless, new tests, methods and tools are being developed and recommended to application nowadays. Such indicators as sensitivity and specificity which are defined on the basis of fourfold contingency tables construction or ROC-analysis method with ROC – curve modelling (Receiver operating characteristic are used as the methods to estimate the diagnostic capability. Fourfold table is used with the purpose to estimate the method which confirms or denies the diagnosis, i.e. a quality indicator. ROC-curve, being a graph, allows making the estimation of model quality by subdivision of two classes on the basis of identifying the point of cutting off a continuous or discrete quantitative attribute.The method of logistic regression technique is introduced as a tool to develop some mathematical-statistical forecasting model of probability of the event the researcher is interested in if there are two possible variants of the outcome. The method of ROC-analysis is chosen and described in detail as a tool to estimate the model quality. The capabilities of the named methods are demonstrated by a real example of creation and efficiency estimation (sensitivity and specificity of a forecasting model of probability of complication development in the form of pyodermatitis in children with atopic dermatitis.
Directory of Open Access Journals (Sweden)
Bjørn P Pedersen
Full Text Available BACKGROUND: Structured Logistic Regression (SLR is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem. RESULTS: Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases. CONCLUSIONS: Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.
Almquist, Zack W.; Butts, Carter T.
2013-01-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2...
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
International Nuclear Information System (INIS)
Dang Yaping; Hu Guoying; Meng Xianwen
1994-01-01
There are many opinions on the reason of hypothyroidism after hyperthyroidism with 131 I treatment. In this respect, there are a few scientific analyses and reports. The non-condition logistic regression solved this problem successfully. It has a higher scientific value and confidence in the risk factor analysis. 748 follow-up patients' data were analysed by the non-condition logistic regression. The results shown that the half-life and 131 I dose were the main causes of the incidence of hypothyroidism. The degree of confidence is 92.4%
Al-Mudhafar, W. J.
2013-12-01
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly
Education-Based Gaps in eHealth: A Weighted Logistic Regression Approach.
Amo, Laura
2016-10-12
Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of eHealth behavior. After controlling for college
Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality?
Helgeland, Jon; Clench-Aas, Jocelyne; Laake, Petter; Veierød, Marit B.
2018-01-01
Introduction A common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied. Materials and methods To study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed. Results None of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals
Smith, Paul F; Ganesh, Siva; Liu, Ping
2013-10-30
Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Energy Technology Data Exchange (ETDEWEB)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam [Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia amirul@unisel.edu.my, zalila@cs.usm.my, norlida@usm.my, adam@usm.my (Malaysia)
2015-10-22
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Carolyn B. Meyer; Sherri L. Miller; C. John Ralph
2004-01-01
The scale at which habitat variables are measured affects the accuracy of resource selection functions in predicting animal use of sites. We used logistic regression models for a wide-ranging species, the marbled murrelet, (Brachyramphus marmoratus) in a large region in California to address how much changing the spatial or temporal scale of...
Mielniczuk, Jan; Teisseyre, Paweł
2018-03-01
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
International Nuclear Information System (INIS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-01-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake
International Nuclear Information System (INIS)
Staskiewicz, Grzegorz; Czekajska-Chehab, Elżbieta; Uhlig, Sebastian; Przegalinski, Jerzy; Maciejewski, Ryszard; Drop, Andrzej
2013-01-01
Purpose: Diagnosis of right ventricular dysfunction in patients with acute pulmonary embolism (PE) is known to be associated with increased risk of mortality. The aim of the study was to calculate a logistic regression model for reliable identification of right ventricular dysfunction (RVD) in patients diagnosed with computed tomography pulmonary angiography. Material and methods: Ninety-seven consecutive patients with acute pulmonary embolism were divided into groups with and without RVD basing upon echocardiographic measurement of pulmonary artery systolic pressure (PASP). PE severity was graded with the pulmonary obstruction score. CT measurements of heart chambers and mediastinal vessels were performed; position of interventricular septum and presence of contrast reflux into the inferior vena cava were also recorded. The logistic regression model was prepared by means of stepwise logistic regression. Results: Among the used parameters, the final model consisted of pulmonary obstruction score, short axis diameter of right ventricle and diameter of inferior vena cava. The calculated model is characterized by 79% sensitivity and 81% specificity, and its performance was significantly better than single CT-based measurements. Conclusion: Logistic regression model identifies RVD significantly better, than single CT-based measurements
Kamphuis, C.; Frank, E.; Burke, J.; Verkerk, G.A.; Jago, J.
2013-01-01
The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected
Directory of Open Access Journals (Sweden)
Qiutong Jin
2016-06-01
Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Alavi, Seyyed Salman; Mohammadi, Mohammad Reza; Souri, Hamid; Mohammadi Kalhori, Soroush; Jannatifard, Fereshteh; Sepahbodi, Ghazal
2017-01-01
Background: The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. Methods: In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran) during 2013-2015. The Manchester driving behavior questionnaire (MDBQ), big five personality test (NEO personality inventory) and semi-structured interview (schizophrenia and affective disorders scale) were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. Results: In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR) of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004). It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009), but other personality factors did not have a significant effect on the equation. Conclusion: The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver’s license. PMID:28293047
Alavi, Seyyed Salman; Mohammadi, Mohammad Reza; Souri, Hamid; Mohammadi Kalhori, Soroush; Jannatifard, Fereshteh; Sepahbodi, Ghazal
2017-01-01
The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran) during 2013-2015. The Manchester driving behavior questionnaire (MDBQ), big five personality test (NEO personality inventory) and semi-structured interview (schizophrenia and affective disorders scale) were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR) of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004). It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009), but other personality factors did not have a significant effect on the equation. The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver's license.
Directory of Open Access Journals (Sweden)
Seyyed Salman Alavi
2017-01-01
Full Text Available Background: The aim of this study was to evaluate the effect of variables such as personality traits, driving behavior and mental illness on road traffic accidents among the drivers with accidents and those without road crash. Methods: In this cohort study, 800 bus and truck drivers were recruited. Participants were selected among drivers who referred to Imam Sajjad Hospital (Tehran, Iran during 2013-2015. The Manchester driving behavior questionnaire (MDBQ, big five personality test (NEO personality inventory and semi-structured interview (SADS were used. After two years, we surveyed all accidents due to human factors that involved the recruited drivers. The data were analyzed using the SPSS software by performing the descriptive statistics, t-test, and multiple logistic regression analysis methods. P values less than 0.05 were considered statistically significant. Results: In terms of controlling the effective and demographic variables, the findings revealed significant differences between the two groups of drivers that were and were not involved in road accidents. In addition, it was found that depression and anxiety could increase the odds ratio (OR of road accidents by 2.4- and 2.7-folds, respectively (P=0.04, P=0.004. It is noteworthy to mention that neuroticism alone can increase the odds of road accidents by 1.1-fold (P=0.009, but other personality factors did not have a significant effect on the equation. Conclusion: The results revealed that some mental disorders affect the incidence of road collisions. Considering the importance and sensitivity of driving behavior, it is necessary to evaluate multiple psychological factors influencing drivers before and after receiving or renewing their driver’s license.
Two SPSS programs for interpreting multiple regression results.
Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo
2010-02-01
When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.
Interpret with caution: multicollinearity in multiple regression of cognitive data.
Morrison, Catriona M
2003-08-01
Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians
General Dimensional Multiple-Output Support Vector Regressions and Their Multiple Kernel Learning.
Chung, Wooyong; Kim, Jisu; Lee, Heejin; Kim, Euntai
2015-11-01
Support vector regression has been considered as one of the most important regression or function approximation methodologies in a variety of fields. In this paper, two new general dimensional multiple output support vector regressions (MSVRs) named SOCPL1 and SOCPL2 are proposed. The proposed methods are formulated in the dual space and their relationship with the previous works is clearly investigated. Further, the proposed MSVRs are extended into the multiple kernel learning and their training is implemented by the off-the-shelf convex optimization tools. The proposed MSVRs are applied to benchmark problems and their performances are compared with those of the previous methods in the experimental section.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee
2016-02-01
The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Saro Lee
2016-02-01
Full Text Available The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS. These factors were analysed using artificial neural network (ANN and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50% and a test set (50%. A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10% was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%. Of the weights used in the artificial neural network model, ‘slope’ yielded the highest weight value (1.330, and ‘aspect’ yielded the lowest value (1.000. This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Kononen, Douglas W; Flannagan, Carol A C; Wang, Stewart C
2011-01-01
A multivariate logistic regression model, based upon National Automotive Sampling System Crashworthiness Data System (NASS-CDS) data for calendar years 1999-2008, was developed to predict the probability that a crash-involved vehicle will contain one or more occupants with serious or incapacitating injuries. These vehicles were defined as containing at least one occupant coded with an Injury Severity Score (ISS) of greater than or equal to 15, in planar, non-rollover crash events involving Model Year 2000 and newer cars, light trucks, and vans. The target injury outcome measure was developed by the Centers for Disease Control and Prevention (CDC)-led National Expert Panel on Field Triage in their recent revision of the Field Triage Decision Scheme (American College of Surgeons, 2006). The parameters to be used for crash injury prediction were subsequently specified by the National Expert Panel. Model input parameters included: crash direction (front, left, right, and rear), change in velocity (delta-V), multiple vs. single impacts, belt use, presence of at least one older occupant (≥ 55 years old), presence of at least one female in the vehicle, and vehicle type (car, pickup truck, van, and sport utility). The model was developed using predictor variables that may be readily available, post-crash, from OnStar-like telematics systems. Model sensitivity and specificity were 40% and 98%, respectively, using a probability cutpoint of 0.20. The area under the receiver operator characteristic (ROC) curve for the final model was 0.84. Delta-V (mph), seat belt use and crash direction were the most important predictors of serious injury. Due to the complexity of factors associated with rollover-related injuries, a separate screening algorithm is needed to model injuries associated with this crash mode. Copyright © 2010 Elsevier Ltd. All rights reserved.
Comparison of ν-support vector regression and logistic equation for ...
African Journals Online (AJOL)
Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Jiménez-Huete, Adolfo; Riva, Elena; Toledano, Rafael; Campo, Pablo; Esteban, Jesús; Barrio, Antonio Del; Franch, Oriol
2014-12-01
The validity of neuropsychological tests for the differential diagnosis of degenerative dementias may depend on the clinical context. We constructed a series of logistic models taking into account this factor. We retrospectively analyzed the demographic and neuropsychological data of 301 patients with probable Alzheimer's disease (AD), frontotemporal degeneration (FTLD), or dementia with Lewy bodies (DLB). Nine models were constructed taking into account the diagnostic question (eg, AD vs DLB) and subpopulation (incident vs prevalent). The AD versus DLB model for all patients, including memory recovery and phonological fluency, was highly accurate (area under the curve = 0.919, sensitivity = 90%, and specificity = 80%). The results were comparable in incident and prevalent cases. The FTLD versus AD and DLB versus FTLD models were both inaccurate. The models constructed from basic neuropsychological variables allowed an accurate differential diagnosis of AD versus DLB but not of FTLD versus AD or DLB. © The Author(s) 2014.
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
Xu, Jun-Fang; Xu, Jing; Li, Shi-Zhu; Jia, Tia-Wu; Huang, Xi-Bao; Zhang, Hua-Ming; Chen, Mei; Yang, Guo-Jing; Gao, Shu-Jing; Wang, Qing-Yun; Zhou, Xiao-Nong
2013-01-01
Background The transmission of schistosomiasis japonica in a local setting is still poorly understood in the lake regions of the People's Republic of China (P. R. China), and its transmission patterns are closely related to human, social and economic factors. Methodology/Principal Findings We aimed to apply the integrated approach of artificial neural network (ANN) and logistic regression model in assessment of transmission risks of Schistosoma japonicum with epidemiological data collected from 2339 villagers from 1247 households in six villages of Jiangling County, P.R. China. By using the back-propagation (BP) of the ANN model, 16 factors out of 27 factors were screened, and the top five factors ranked by the absolute value of mean impact value (MIV) were mainly related to human behavior, i.e. integration of water contact history and infection history, family with past infection, history of water contact, infection history, and infection times. The top five factors screened by the logistic regression model were mainly related to the social economics, i.e. village level, economic conditions of family, age group, education level, and infection times. The risk of human infection with S. japonicum is higher in the population who are at age 15 or younger, or with lower education, or with the higher infection rate of the village, or with poor family, and in the population with more than one time to be infected. Conclusion/Significance Both BP artificial neural network and logistic regression model established in a small scale suggested that individual behavior and socioeconomic status are the most important risk factors in the transmission of schistosomiasis japonica. It was reviewed that the young population (≤15) in higher-risk areas was the main target to be intervened for the disease transmission control. PMID:23556015
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Verachtert, E.; Den Eeckhaut, M. Van; Poesen, J.; Govers, G.; Deckers, J.
2011-07-01
Soil piping (tunnel erosion) has been recognised as an important erosion process in collapsible loess-derived soils of temperate humid climates, which can cause collapse of the topsoil and formation of discontinuous gullies. Information about the spatial patterns of collapsed pipes and regional models describing these patterns is still limited. Therefore, this study aims at better understanding the factors controlling the spatial distribution and predicting pipe collapse. A dataset with parcels suffering from collapsed pipes (n = 560) and parcels without collapsed pipes was obtained through a regional survey in a 236 km² study area in the Flemish Ardennes (Belgium). Logistic regression was applied to find the best model describing the relationship between the presence/absence of a collapsed pipe and a set of independent explanatory variables (i.e. slope gradient, drainage area, distance-to-thalweg, curvature, aspect, soil type and lithology). Special attention was paid to the selection procedure of the grid cells without collapsed pipes. Apart from the first piping susceptibility map created by logistic regression modelling, a second map was made based on topographical thresholds of slope gradient and upslope drainage area. The logistic regression model allowed identification of the most important factors controlling pipe collapse. Pipes are much more likely to occur when a topographical threshold depending on both slope gradient and upslope area is exceeded in zones with a sufficient water supply (due to topographical convergence and/or the presence of a clay-rich lithology). On the other hand, the use of slope-area thresholds only results in reasonable predictions of piping susceptibility, with minimum information.
Overcoming multicollinearity in multiple regression using correlation coefficient
Zainodin, H. J.; Yap, S. J.
2013-09-01
Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.
Time-localized wavelet multiple regression and correlation
Fernández-Macho, Javier
2018-02-01
This paper extends wavelet methodology to handle comovement dynamics of multivariate time series via moving weighted regression on wavelet coefficients. The concept of wavelet local multiple correlation is used to produce one single set of multiscale correlations along time, in contrast with the large number of wavelet correlation maps that need to be compared when using standard pairwise wavelet correlations with rolling windows. Also, the spectral properties of weight functions are investigated and it is argued that some common time windows, such as the usual rectangular rolling window, are not satisfactory on these grounds. The method is illustrated with a multiscale analysis of the comovements of Eurozone stock markets during this century. It is shown how the evolution of the correlation structure in these markets has been far from homogeneous both along time and across timescales featuring an acute divide across timescales at about the quarterly scale. At longer scales, evidence from the long-term correlation structure can be interpreted as stable perfect integration among Euro stock markets. On the other hand, at intramonth and intraweek scales, the short-term correlation structure has been clearly evolving along time, experiencing a sharp increase during financial crises which may be interpreted as evidence of financial 'contagion'.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Kanbayashi, Yuko; Ishikawa, Takeshi; Kanazawa, Motohiro; Nakajima, Yuki; Kawano, Rumi; Tabuchi, Yusuke; Yoshioka, Tomoko; Ihara, Norihiko; Hosokawa, Toyoshi; Takayama, Koichi; Shikata, Keisuke; Taguchi, Tetsuya
2018-03-16
Although pegfilgrastim prophylaxis is expected to maintain the relative dose intensity (RDI) of chemotherapy and improve safety, information is limited. However, the optimal selection of patients eligible for pegfilgrastim prophylaxis is an important issue from a medical economics viewpoint. Therefore, this retrospective study identified factors that could predict these eligible patients to maintain the RDI. The participants included 166 cancer patients undergoing pegfilgrastim prophylaxis combined with chemotherapy in our outpatient chemotherapy center between March 2015 and April 2017. Variables were extracted from clinical records for regression analysis of factors related to maintenance of the RDI. RDI was classified into four categories: 100% = 0, 85% or predictive factors in patients eligible for pegfilgrastim prophylaxis to maintain the RDI. Threshold measures were examined using a receiver operating characteristic (ROC) analysis curve. Age [odds ratio (OR) 1.07, 95% confidence interval (CI) 1.04-1.11; P maintenance. ROC curve analysis of the group that failed to maintain the RDI indicated that the threshold for age was 70 years and above, with a sensitivity of 60.0% and specificity of 80.2% (area under the curve: 0.74). In conclusion, younger age, anemia (less), and administration of pegfilgrastim 24-72 h after chemotherapy were significant factors for RDI maintenance.
Directory of Open Access Journals (Sweden)
Pape Sarah A
2009-02-01
Full Text Available Abstract Background Laser-Doppler imaging (LDI of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA, site and cause of burn, day of LDI scan, burn center. It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are
Ettinger, Susanne; Mounaud, Loïc; Magill, Christina; Yao-Lafourcade, Anne-Françoise; Thouret, Jean-Claude; Manville, Vern; Negulescu, Caterina; Zuccaro, Giulio; De Gregorio, Daniela; Nardone, Stefano; Uchuchoque, Juan Alexis Luque; Arguedas, Anita; Macedo, Luisa; Manrique Llerena, Nélida
2016-10-01
bivariate analyses were applied to better characterize each vulnerability parameter. Multiple corresponding analyses revealed strong relationships between the "Distance to channel or bridges", "Structural building type", "Building footprint" and the observed damage. Logistic regression enabled quantification of the contribution of each explanatory parameter to potential damage, and determination of the significant parameters that express the damage susceptibility of a building. The model was applied 200 times on different calibration and validation data sets in order to examine performance. Results show that 90% of these tests have a success rate of more than 67%. Probabilities (at building scale) of experiencing different damage levels during a future event similar to the 8 February 2013 flash flood are the major outcomes of this study.
He, Dan; Kuhn, David; Parida, Laxmi
2016-06-15
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
Weibull and lognormal Taguchi analysis using multiple linear regression
International Nuclear Information System (INIS)
Piña-Monarrez, Manuel R.; Ortiz-Yañez, Jesús F.
2015-01-01
The paper provides to reliability practitioners with a method (1) to estimate the robust Weibull family when the Taguchi method (TM) is applied, (2) to estimate the normal operational Weibull family in an accelerated life testing (ALT) analysis to give confidence to the extrapolation and (3) to perform the ANOVA analysis to both the robust and the normal operational Weibull family. On the other hand, because the Weibull distribution neither has the normal additive property nor has a direct relationship with the normal parameters (µ, σ), in this paper, the issues of estimating a Weibull family by using a design of experiment (DOE) are first addressed by using an L_9 (3"4) orthogonal array (OA) in both the TM and in the Weibull proportional hazard model approach (WPHM). Then, by using the Weibull/Gumbel and the lognormal/normal relationships and multiple linear regression, the direct relationships between the Weibull and the lifetime parameters are derived and used to formulate the proposed method. Moreover, since the derived direct relationships always hold, the method is generalized to the lognormal and ALT analysis. Finally, the method’s efficiency is shown through its application to the used OA and to a set of ALT data. - Highlights: • It gives the statistical relations and steps to use the Taguchi Method (TM) to analyze Weibull data. • It gives the steps to determine the unknown Weibull family to both the robust TM setting and the normal ALT level. • It gives a method to determine the expected lifetimes and to perform its ANOVA analysis in TM and ALT analysis. • It gives a method to give confidence to the extrapolation in an ALT analysis by using the Weibull family of the normal level.
Yilmaz, Isik; Keskin, Inan; Marschalko, Marian; Bednarik, Martin
2010-05-01
This study compares the GIS based collapse susceptibility mapping methods such as; conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) applied in gypsum rock masses in Sivas basin (Turkey). Digital Elevation Model (DEM) was first constructed using GIS software. Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index- TWI, stream power index- SPI, Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from CP, LR and ANN models, and they were then compared by means of their validations. Area Under Curve (AUC) values obtained from all three methodologies showed that the map obtained from ANN model looks like more accurate than the other models, and the results also showed that the artificial neural networks is a usefull tool in preparation of collapse susceptibility map and highly compatible with GIS operating features. Key words: Collapse; doline; susceptibility map; gypsum; GIS; conditional probability; logistic regression; artificial neural networks.
International Nuclear Information System (INIS)
Papritz, A.; Reichard, P.U.
2009-01-01
Soils of allotments are often contaminated by heavy metals and persistent organic pollutants. In particular, lead (Pb) and polycyclic aromatic hydrocarbons (PAHs) frequently exceed legal intervention values (IVs). Allotments are popular in European countries; cities may own and let several thousand allotment plots. Assessing soil contamination for all the plots would be very costly. Soil contamination in allotments is often linked to gardening practice and historic land use. Hence, we predict the risk of IV exceedance from attributes that characterize the history and management of allotment areas (age, nearby presence of pollutant sources, prior land use). Robust logistic regression analyses of data of Swiss allotments demonstrate that the risk of IV exceedance can be predicted quite precisely without costly soil analyses. Thus, the new method allows screening many allotments at small costs, and it helps to deploy the resources available for soil contamination surveying more efficiently. - The contamination of allotment soils, expressed as frequency of intervention value exceedance, depends on the age and further attributes of the allotments and can be predicted by logistic regression.
Directory of Open Access Journals (Sweden)
Chong Wei
2015-01-01
Full Text Available Logistic regression models have been widely used in previous studies to analyze public transport utilization. These studies have shown travel time to be an indispensable variable for such analysis and usually consider it to be a deterministic variable. This formulation does not allow us to capture travelers’ perception error regarding travel time, and recent studies have indicated that this error can have a significant effect on modal choice behavior. In this study, we propose a logistic regression model with a hierarchical random error term. The proposed model adds a new random error term for the travel time variable. This term structure enables us to investigate travelers’ perception error regarding travel time from a given choice behavior dataset. We also propose an extended model that allows constraining the sign of this error in the model. We develop two Gibbs samplers to estimate the basic hierarchical model and the extended model. The performance of the proposed models is examined using a well-known dataset.
Li, Saijiao; He, Aiyan; Yang, Jing; Yin, TaiLang; Xu, Wangming
2011-01-01
To investigate factors that can affect compliance with treatment of polycystic ovary syndrome (PCOS) in infertile patients and to provide a basis for clinical treatment, specialist consultation and health education. Patient compliance was assessed via a questionnaire based on the Morisky-Green test and the treatment principles of PCOS. Then interviews were conducted with 99 infertile patients diagnosed with PCOS at Renmin Hospital of Wuhan University in China, from March to September 2009. Finally, these data were analyzed using logistic regression analysis. Logistic regression analysis revealed that a total of 23 (25.6%) of the participants showed good compliance. Factors that significantly (p < 0.05) affected compliance with treatment were the patient's body mass index, convenience of medical treatment and concerns about adverse drug reactions. Patients who are obese, experience inconvenient medical treatment or are concerned about adverse drug reactions are more likely to exhibit noncompliance. Treatment education and intervention aimed at these patients should be strengthened in the clinic to improve treatment compliance. Further research is needed to better elucidate the compliance behavior of patients with PCOS.
Guo, L W; Liu, S Z; Zhang, M; Chen, Q; Zhang, S K; Sun, X B
2017-12-10
Objective: To investigate the effect of fried food intake on the pathogenesis of esophageal cancer and precancerous lesions. Methods: From 2005 to 2013, all the residents aged 40-69 years from 11 counties (cities) where cancer screening of upper gastrointestinal cancer had been conducted in rural areas of Henan province, were recruited as the subjects of study. Information on demography and lifestyle was collected. The residents under study were screened with iodine staining endoscopic examination and biopsy samples were diagnosed pathologically, under standardized criteria. Subjects with high risk were divided into the groups based on their different pathological degrees. Multivariate ordinal logistic regression analysis was used to analyze the relationship between the frequency of fried food intake and esophageal cancer and precancerous lesions. Results: A total number of 8 792 cases with normal esophagus, 3 680 with mild hyperplasia, 972 with moderate hyperplasia, 413 with severe hyperplasia carcinoma in situ, and 336 cases of esophageal cancer were recruited. Results from multivariate logistic regression analysis showed that, when compared with those who did not eat fried food, the intake of fried food (food appeared a risk factor for both esophageal cancer and precancerous lesions.
Directory of Open Access Journals (Sweden)
Danilo A. López-Sarmiento
2013-11-01
Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.
Directory of Open Access Journals (Sweden)
Sheng-Chuan Chen
2013-01-01
Full Text Available This study develops a model for evaluating the hazard level of landslides at Alishan Forestry Railway, Taiwan, by using logistic regression with the assistance of a geographical information system (GIS. A typhoon event-induced landslide inventory, independent variables, and a triggering factor were used to build the model. The environmental factors such as bedrock lithology from the geology database; topographic aspect, terrain roughness, profile curvature, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images were used as variables that influence landslide occurrence. The area under curve (AUC of a receiver operator characteristic (ROC curve was used to validate the model. Effects of parameters on landslide occurrence were assessed from the corresponding coefficient that appears in the logistic regression function. Thereafter, the model was applied to predict the probability of landslides for rainfall data of different return periods. Using a predicted map of probability, the study area was classified into four ranks of landslide susceptibility: low, medium, high, and very high. As a result, most high susceptibility areas are located on the western portion of the study area. Several train stations and railways are located on sites with a high susceptibility ranking.
Local bilinear multiple-output quantile/depth regression
Czech Academy of Sciences Publication Activity Database
Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav
2015-01-01
Roč. 21, č. 3 (2015), s. 1435-1466 ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.372, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf
2013-01-01
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2004 US presidential election cycle. This is a particularly interesting case because it marks the debut of Internet-based media such as blogs and social networking web sites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all Democratic National Convention/Republican National Convention–designated blog citation networks, we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms as well as exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs. PMID:24143060
Directory of Open Access Journals (Sweden)
Mirjam J Knol
Full Text Available BACKGROUND: In randomized controlled trials (RCTs, the odds ratio (OR can substantially overestimate the risk ratio (RR if the incidence of the outcome is over 10%. This study determined the frequency of use of ORs, the frequency of overestimation of the OR as compared with its accompanying RR in published RCTs, and we assessed how often regression models that calculate RRs were used. METHODS: We included 288 RCTs published in 2008 in five major general medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, New England Journal of Medicine. If an OR was reported, we calculated the corresponding RR, and we calculated the percentage of overestimation by using the formula . RESULTS: Of 193 RCTs with a dichotomous primary outcome, 24 (12.4% presented a crude and/or adjusted OR for the primary outcome. In five RCTs (2.6%, the OR differed more than 100% from its accompanying RR on the log scale. Forty-one of all included RCTs (n = 288; 14.2% presented ORs for other outcomes, or for subgroup analyses. Nineteen of these RCTs (6.6% had at least one OR that deviated more than 100% from its accompanying RR on the log scale. Of 53 RCTs that adjusted for baseline variables, 15 used logistic regression. Alternative methods to estimate RRs were only used in four RCTs. CONCLUSION: ORs and logistic regression are often used in RCTs and in many articles the OR did not approximate the RR. Although the authors did not explicitly misinterpret these ORs as RRs, misinterpretation by readers can seriously affect treatment decisions and policy making.
Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M
2018-03-01
Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83
A comparative study of multiple regression analysis and back ...
Indian Academy of Sciences (India)
Abhijit Sarkar
artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arc welding ... Keywords. Submerged arc welding (SAW); multi-regression analysis (MRA); artificial neural network ..... Degree of freedom.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross
Schlögel, R.; Marchesini, I.; Alvioli, M.; Reichenbach, P.; Rossi, M.; Malet, J.-P.
2018-01-01
We perform landslide susceptibility zonation with slope units using three digital elevation models (DEMs) of varying spatial resolution of the Ubaye Valley (South French Alps). In so doing, we applied a recently developed algorithm automating slope unit delineation, given a number of parameters, in order to optimize simultaneously the partitioning of the terrain and the performance of a logistic regression susceptibility model. The method allowed us to obtain optimal slope units for each available DEM spatial resolution. For each resolution, we studied the susceptibility model performance by analyzing in detail the relevance of the conditioning variables. The analysis is based on landslide morphology data, considering either the whole landslide or only the source area outline as inputs. The procedure allowed us to select the most useful information, in terms of DEM spatial resolution, thematic variables and landslide inventory, in order to obtain the most reliable slope unit-based landslide susceptibility assessment.
Ali, Asad; Zaidi, Farrah; Fatima, Syeda Hira; Adnan, Muhammad; Ullah, Saleem
2018-03-24
In this study, we propose to develop a geostatistical computational framework to model the distribution of rat bite infestation of epidemic proportion in Peshawar valley, Pakistan. Two species Rattus norvegicus and Rattus rattus are suspected to spread the infestation. The framework combines strengths of maximum entropy algorithm and binomial kriging with logistic regression to spatially model the distribution of infestation and to determine the individual role of environmental predictors in modeling the distribution trends. Our results demonstrate the significance of a number of social and environmental factors in rat infestations such as (I) high human population density; (II) greater dispersal ability of rodents due to the availability of better connectivity routes such as roads, and (III) temperature and precipitation influencing rodent fecundity and life cycle.
Directory of Open Access Journals (Sweden)
Wedagama D.M.P.
2010-01-01
Full Text Available In Denpasar the capital of Bali Province, motorcycle accident contributes to about 80% of total road accidents. Out of those motorcycle accidents, 32% are fatal accidents. This study investigates the influence of accident related factors on motorcycle fatal accidents in the city of Denpasar during period 2006-2008 using a logistic regression model. The study found that the fatality of collision with pedestrians and right angle accidents were respectively about 0.44 and 0.40 times lower than collision with other vehicles and accidents due to other factors. In contrast, the odds that a motorcycle accident will be fatal due to collision with heavy and light vehicles were 1.67 times more likely than with other motorcycles. Collision with pedestrians, right angle accidents, and heavy and light vehicles were respectively accounted for 31%, 29%, and 63% of motorcycle fatal accidents.
Directory of Open Access Journals (Sweden)
Glauco H.S. Mendes
2013-09-01
Full Text Available Critical success factors in new product development (NPD in the Brazilian small and medium enterprises (SMEs are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.
Huang, Jinxi; Wang, Chenghu; Yuan, Weiwei; Zhang, Zhandong; Chen, Beibei; Zhang, Xiefu
2017-01-01
Background This study was conducted to investigate the risk factors of anastomotic fistula after the radical resection of esophageal‐cardiac cancer. Methods Five hundred and forty‐four esophageal‐cardiac cancer patients who underwent surgery and had complete clinical data were included in the study. Fifty patients diagnosed with postoperative anastomotic fistula were considered the case group and the remaining 494 subjects who did not develop postoperative anastomotic fistula were considered the control. The potential risk factors for anastomotic fistula, such as age, gender, diabetes history, smoking history, were collected and compared between the groups. Statistically significant variables were substituted into logistic regression to further evaluate the independent risk factors for postoperative anastomotic fistulas in esophageal‐cardiac cancer. Results The incidence of anastomotic fistulas was 9.2% (50/544). Logistic regression analysis revealed that female gender (P < 0.05), laparoscopic surgery (P < 0.05), decreased postoperative albumin (P < 0.05), and postoperative renal dysfunction (P < 0.05) were independent risk factors for anastomotic fistulas in patients who received surgery for esophageal‐cardiac cancer. Of the 50 anastomotic fistulas, 16 cases were small fistulas, which were only discovered by conventional imaging examination and not presenting clinical symptoms. All of the anastomotic fistulas occurred within seven days after surgery. Five of the patients with anastomotic fistulas underwent a second surgery and three died. Conclusion Female patients with esophageal‐cardiac cancer treated with endoscopic surgery and suffering from postoperative hypoproteinemia and renal dysfunction were susceptible to postoperative anastomotic fistula. PMID:28940985
Directory of Open Access Journals (Sweden)
CUI Yanping
2014-10-01
Full Text Available ObjectiveTo analyze the prognostic factors in acute-on-chronic liver failure (ACLF patients with hepatic encephalopathy (HE and to explore the risk factors for prognosis. MethodsA retrospective analysis was performed on 106 ACLF patients with HE who were hospitalized in our hospital from January 2010 to July 2013. The patients were divided into improved group and deteriorated group. The univariate indicators including age, sex, laboratory indicators ［total bilirubin (TBil, albumin (Alb, alanine aminotransferase (ALT, aspartate amino-transferase (AST, and prothrombin time activity (PTA］, the stage of HE, complications ［persistent hyponatremia, digestive tract bleeding, hepatorenal syndrome (HRS, ascites, infection, and spontaneous bacterial peritonitis (SBP］, and plasma exchange were analyzed by chi-square test or t-test. Indicators with statistical significance were subsequently analyzed by binary logistic regression. ResultsUnivariate analysis showed that ALT (P=0.009, PTA (P=0.043, the stage of HE (P=0.000, and HRS (P=0.003 were significantly different between the two groups, whereas differences in age, sex, TBil, Alb, AST, persistent hyponatremia, digestive tract bleeding, ascites, infection, SBP, and plasma exchange were not statistically significant (P＞0.05. Binary logistic regression demonstrated that PTA (b=-0097, P=0.025, OR=0.908, HRS (b=2.279, P=0.007, OR=9.764, and the stage of HE (b=1873, P=0.000, OR=6.510 were prognostic factors in ACLF patients with HE. ConclusionThe stage of HE, HRS, and PTA are independent influential factors for the prognosis in ACLF patients with HE. Reduced PTA, advanced HE stage, and the presence of HRS indicate worse prognosis.
Demir, Gökhan; aytekin, mustafa; banu ikizler, sabriye; angın, zekai
2013-04-01
The North Anatolian Fault is know as one of the most active and destructive fault zone which produced many earthquakes with high magnitudes. Along this fault zone, the morphology and the lithological features are prone to landsliding. However, many earthquake induced landslides were recorded by several studies along this fault zone, and these landslides caused both injuiries and live losts. Therefore, a detailed landslide susceptibility assessment for this area is indispancable. In this context, a landslide susceptibility assessment for the 1445 km2 area in the Kelkit River valley a part of North Anatolian Fault zone (Eastern Black Sea region of Turkey) was intended with this study, and the results of this study are summarized here. For this purpose, geographical information system (GIS) and a bivariate statistical model were used. Initially, Landslide inventory maps are prepared by using landslide data determined by field surveys and landslide data taken from General Directorate of Mineral Research and Exploration. The landslide conditioning factors are considered to be lithology, slope gradient, slope aspect, topographical elevation, distance to streams, distance to roads and distance to faults, drainage density and fault density. ArcGIS package was used to manipulate and analyze all the collected data Logistic regression method was applied to create a landslide susceptibility map. Landslide susceptibility maps were divided into five susceptibility regions such as very low, low, moderate, high and very high. The result of the analysis was verified using the inventoried landslide locations and compared with the produced probability model. For this purpose, Area Under Curvature (AUC) approach was applied, and a AUC value was obtained. Based on this AUC value, the obtained landslide susceptibility map was concluded as satisfactory. Keywords: North Anatolian Fault Zone, Landslide susceptibility map, Geographical Information Systems, Logistic Regression Analysis.
Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar
2008-01-01
The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
Bejaei, M; Wiseman, K; Cheng, K M
2015-01-01
Consumers' interest in specialty eggs appears to be growing in Europe and North America. The objective of this research was to develop logistic regression models that utilise purchaser attributes and demographics to predict the probability of a consumer purchasing a specific type of table egg including regular (white and brown), non-caged (free-run, free-range and organic) or nutrient-enhanced eggs. These purchase prediction models, together with the purchasers' attributes, can be used to assess market opportunities of different egg types specifically in British Columbia (BC). An online survey was used to gather data for the models. A total of 702 completed questionnaires were submitted by BC residents. Selected independent variables included in the logistic regression to develop models for different egg types to predict the probability of a consumer purchasing a specific type of table egg. The variables used in the model accounted for 54% and 49% of variances in the purchase of regular and non-caged eggs, respectively. Research results indicate that consumers of different egg types exhibit a set of unique and statistically significant characteristics and/or demographics. For example, consumers of regular eggs were less educated, older, price sensitive, major chain store buyers, and store flyer users, and had lower awareness about different types of eggs and less concern regarding animal welfare issues. However, most of the non-caged egg consumers were less concerned about price, had higher awareness about different types of table eggs, purchased their eggs from local/organic grocery stores, farm gates or farmers markets, and they were more concerned about care and feeding of hens compared to consumers of other eggs types.
DEFF Research Database (Denmark)
Merlo, J; Chaix, B; Ohlsson, H
2006-01-01
STUDY OBJECTIVE: In social epidemiology, it is easy to compute and interpret measures of variation in multilevel linear regression, but technical difficulties exist in the case of logistic regression. The aim of this study was to present measures of variation appropriate for the logistic case...... in a didactic rather than a mathematical way. Design and PARTICIPANTS: Data were used from the health survey conducted in 2000 in the county of Scania, Sweden, that comprised 10 723 persons aged 18-80 years living in 60 areas. Conducting multilevel logistic regression different techniques were applied...... propensity areas with the area educational level. The sorting out index was equal to 82%. CONCLUSION: Measures of variation in logistic regression should be promoted in social epidemiological and public health research as efficient means of quantifying the importance of the context of residence...
National Research Council Canada - National Science Library
Ramakrishnan, Viswanathan
2003-01-01
.... A generalized estimation equations (GEE) logistic regression model was used for the modeling. A shared trait is defined for two discrete traits based upon explicit patterns of trait concordance and discordance within twin pairs...
Institute of Scientific and Technical Information of China (English)
高鸿云; 冯金英; 徐俊冕; 郑士俊
2001-01-01
Objective: To identify the related psychosocial risk factors of emotional disorders in children. Methods:To use case-control approach in which. Diagnosis was made by clinical interview according to ICD-10 criteria. Eighty eight cases and controls separately filled out general condition inventory. The results were put into Logistic regression model for analysis. Results: The children with timid personality, without kindergarten education, or with parents who were administrative or technical personnel, were apt to have emotional disorders. The children who were usually counseled by their mothers had less emotional disorders than those were beaten. Conclusion: The emotional disorders were the results of multiple factors. Prevention of children's emotional disorders should be focused on the children's personality and family education.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (Pcomparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Conoscenti, Christian; Ciaccio, Marilena; Caraballo-Arias, Nathalie Almaru; Gómez-Gutiérrez, Álvaro; Rotigliano, Edoardo; Agnesi, Valerio
2015-08-01
In this paper, terrain susceptibility to earth-flow occurrence was evaluated by using geographic information systems (GIS) and two statistical methods: Logistic regression (LR) and multivariate adaptive regression splines (MARS). LR has been already demonstrated to provide reliable predictions of earth-flow occurrence, whereas MARS, as far as we know, has never been used to generate earth-flow susceptibility models. The experiment was carried out in a basin of western Sicily (Italy), which extends for 51 km2 and is severely affected by earth-flows. In total, we mapped 1376 earth-flows, covering an area of 4.59 km2. To explore the effect of pre-failure topography on earth-flow spatial distribution, we performed a reconstruction of topography before the landslide occurrence. This was achieved by preparing a digital terrain model (DTM) where altitude of areas hosting landslides was interpolated from the adjacent undisturbed land surface by using the algorithm topo-to-raster. This DTM was exploited to extract 15 morphological and hydrological variables that, in addition to outcropping lithology, were employed as explanatory variables of earth-flow spatial distribution. The predictive skill of the earth-flow susceptibility models and the robustness of the procedure were tested by preparing five datasets, each including a different subset of landslides and stable areas. The accuracy of the predictive models was evaluated by drawing receiver operating characteristic (ROC) curves and by calculating the area under the ROC curve (AUC). The results demonstrate that the overall accuracy of LR and MARS earth-flow susceptibility models is from excellent to outstanding. However, AUC values of the validation datasets attest to a higher predictive power of MARS-models (AUC between 0.881 and 0.912) with respect to LR-models (AUC between 0.823 and 0.870). The adopted procedure proved to be resistant to overfitting and stable when changes of the learning and validation samples are
Zhang, Shanyong; Yang, Lili; Peng, Chuangang; Wu, Minfei
2018-02-01
The aim of the present study was to investigate the risk factors for postoperative recurrence of spinal tumors by logistic regression analysis and analysis of prognostic factors. In total, 77 male and 48 female patients with spinal tumor were selected in our hospital from January, 2010 to December, 2015 and divided into the benign (n=76) and malignant groups (n=49). All the patients underwent microsurgical resection of spinal tumors and were reviewed regularly 3 months after operation. The McCormick grading system was used to evaluate the postoperative spinal cord function. Data were subjected to statistical analysis. Of the 125 cases, 63 cases showed improvement after operation, 50 cases were stable, and deterioration was found in 12 cases. The improvement rate of patients with cervical spine tumor, which reached 56.3%, was the highest. Fifty-two cases of sensory disturbance, 34 cases of pain, 30 cases of inability to exercise, 26 cases of ataxia, and 12 cases of sphincter disorders were found after operation. Seventy-two cases (57.6%) underwent total resection, 18 cases (14.4%) received subtotal resection, 23 cases (18.4%) received partial resection, and 12 cases (9.6%) were only treated with biopsy/decompression. Postoperative recurrence was found in 57 cases (45.6%). The mean recurrence time of patients in the malignant group was 27.49±6.09 months, and the mean recurrence time of patients in the benign group was 40.62±4.34. The results were significantly different (Pregression analysis of total resection-related factors showed that total resection should be the preferred treatment for patients with benign tumors, thoracic and lumbosacral tumors, and lower McCormick grade, as well as patients without syringomyelia and intramedullary tumors. Logistic regression analysis of recurrence-related factors revealed that the recurrence rate was relatively higher in patients with malignant, cervical, thoracic and lumbosacral, intramedullary tumors, and higher Mc
Pian, Wenjing; Khoo, Christopher Sg; Chi, Jianxing
2017-12-21
Users searching for health information on the Internet may be searching for their own health issue, searching for someone else's health issue, or browsing with no particular health issue in mind. Previous research has found that these three categories of users focus on different types of health information. However, most health information websites provide static content for all users. If the three types of user health information need contexts can be identified by the Web application, the search results or information offered to the user can be customized to increase its relevance or usefulness to the user. The aim of this study was to investigate the possibility of identifying the three user health information contexts (searching for self, searching for others, or browsing with no particular health issue in mind) using just hyperlink clicking behavior; using eye-tracking information; and using a combination of eye-tracking, demographic, and urgency information. Predictive models are developed using multinomial logistic regression. A total of 74 participants (39 females and 35 males) who were mainly staff and students of a university were asked to browse a health discussion forum, Healthboards.com. An eye tracker recorded their examining (eye fixation) and skimming (quick eye movement) behaviors on 2 types of screens: summary result screen displaying a list of post headers, and detailed post screen. The following three types of predictive models were developed using logistic regression analysis: model 1 used only the time spent in scanning the summary result screen and reading the detailed post screen, which can be determined from the user's mouse clicks; model 2 used the examining and skimming durations on each screen, recorded by an eye tracker; and model 3 added user demographic and urgency information to model 2. An analysis of variance (ANOVA) analysis found that users' browsing durations were significantly different for the three health information contexts
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Chiu, Yu-Jen; Liao, Wen-Chieh; Wang, Tien-Hsiang; Shih, Yu-Chung; Ma, Hsu; Lin, Chih-Hsun; Wu, Szu-Hsien; Perng, Cherng-Kang
2017-08-01
Despite significant advances in medical care and surgical techniques, pressure sore reconstruction is still prone to elevated rates of complication and recurrence. We conducted a retrospective study to investigate not only complication and recurrence rates following pressure sore reconstruction but also preoperative risk stratification. This study included 181 ulcers underwent flap operations between January 2002 and December 2013 were included in the study. We performed a multivariable logistic regression model, which offers a regression-based method accounting for the within-patient correlation of the success or failure of each flap. The overall complication and recurrence rates for all flaps were 46.4% and 16.0%, respectively, with a mean follow-up period of 55.4 ± 38.0 months. No statistically significant differences of complication and recurrence rates were observed among three different reconstruction methods. In subsequent analysis, albumin ≤3.0 g/dl and paraplegia were significantly associated with higher postoperative complication. The anatomic factor, ischial wound location, significantly trended toward the development of ulcer recurrence. In the fasciocutaneous group, paraplegia had significant correlation to higher complication and recurrence rates. In the musculocutaneous flap group, variables had no significant correlation to complication and recurrence rates. In the free-style perforator group, ischial wound location and malnourished status correlated with significantly higher complication rates; ischial wound location also correlated with significantly higher recurrence rate. Ultimately, our review of a noteworthy cohort with lengthy follow-up helped identify and confirm certain risk factors that can facilitate a more informed and thoughtful pre- and postoperative decision-making process for patients with pressure ulcers. Copyright © 2017 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All
Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V
2012-01-01
From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).
Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I
2018-01-01
Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.
DEFF Research Database (Denmark)
Koop, Gerrit; Collar, Carol A.; Toft, Nils
2013-01-01
Identification of risk factors for subclinical intramammary infections (IMI) in dairy goats should contribute to improved udder health. Intramammary infection may be diagnosed by bacteriological culture or by somatic cell count (SCC) of a milk sample. Both bacteriological culture and SCC are impe......Identification of risk factors for subclinical intramammary infections (IMI) in dairy goats should contribute to improved udder health. Intramammary infection may be diagnosed by bacteriological culture or by somatic cell count (SCC) of a milk sample. Both bacteriological culture and SCC...... are imperfect tests, particularly lacking sensitivity, which leads to misclassification and thus to biased estimates of odds ratios in risk factor studies. The objective of this study was to evaluate risk factors for the true (latent) IMI status of major pathogens in dairy goats. We used Bayesian logistic...... regression models that accounted for imperfect measurement of IMI by both culture and SCC. Udder half milk samples were collected from 530 Dutch and 438 California dairy goats in 10 herds on 3 occasions during lactation. Udder halves were classified as positive or negative for isolation of a major pathogen...
Parodi, Stefano; Dosi, Corrado; Zambon, Antonella; Ferrari, Enrico; Muselli, Marco
2017-12-01
Identifying potential risk factors for problem gambling (PG) is of primary importance for planning preventive and therapeutic interventions. We illustrate a new approach based on the combination of standard logistic regression and an innovative method of supervised data mining (Logic Learning Machine or LLM). Data were taken from a pilot cross-sectional study to identify subjects with PG behaviour, assessed by two internationally validated scales (SOGS and Lie/Bet). Information was obtained from 251 gamblers recruited in six betting establishments. Data on socio-demographic characteristics, lifestyle and cognitive-related factors, and type, place and frequency of preferred gambling were obtained by a self-administered questionnaire. The following variables associated with PG were identified: instant gratification games, alcohol abuse, cognitive distortion, illegal behaviours and having started gambling with a relative or a friend. Furthermore, the combination of LLM and LR indicated the presence of two different types of PG, namely: (a) daily gamblers, more prone to illegal behaviour, with poor money management skills and who started gambling at an early age, and (b) non-daily gamblers, characterised by superstitious beliefs and a higher preference for immediate reward games. Finally, instant gratification games were strongly associated with the number of games usually played. Studies on gamblers habitually frequently betting shops are rare. The finding of different types of PG by habitual gamblers deserves further analysis in larger studies. Advanced data mining algorithms, like LLM, are powerful tools and potentially useful in identifying risk factors for PG.
Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe
2012-09-01
To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.
Tang, Li-Na; Ye, Xiao-Zhou; Yan, Qiu-Ge; Chang, Hong-Juan; Ma, Yu-Qiao; Liu, De-Bin; Li, Zhi-Gen; Yu, Yi-Zhen
2017-02-01
The risk factors of high trait anger of juvenile offenders were explored through questionnaire study in a youth correctional facility of Hubei province, China. A total of 1090 juvenile offenders in Hubei province were investigated by self-compiled social-demographic questionnaire, Childhood Trauma Questionnaire (CTQ), and State-Trait Anger Expression Inventory-II (STAXI-II). The risk factors were analyzed by chi-square tests, correlation analysis, and binary logistic regression analysis with SPSS 19.0. A total of 1082 copies of valid questionnaires were collected. High trait anger group (n=316) was defined as those who scored in the upper 27th percentile of STAXI-II trait anger scale (TAS), and the rest were defined as low trait anger group (n=766). The risk factors associated with high level of trait anger included: childhood emotional abuse, childhood sexual abuse, step family, frequent drug abuse, and frequent internet using (P0.05). It was suggested that traumatic experience in childhood and unhealthy life style may significantly increase the level of trait anger in adulthood. The risk factors of high trait anger and their effects should be taken into consideration seriously.
Caldwell, A R; Terhorst, L; Skidmore, E R; Bendixen, R M
2018-01-23
The present study aimed to examine the associations between frequency of family meals and low fruit and vegetable intake in preschool children. Promoting healthy nutrition early in life is recommended for combating childhood obesity. Frequency of family meals is associated with fruit and vegetable intake in school-age children and adolescents; the relationship in young children is less clear. We completed a secondary analysis using data from the Early Childhood Longitudinal Study-Birth Cohort. Participants included children, born in the year 2001, to mothers who were >15 years old (n = 8 950). Data were extracted from structured parent interviews during the year prior to kindergarten. We used hierarchical logistic regression to describe the relationships between frequency of family meals and low fruit and vegetable intake. Frequency of family meals was associated with low fruit and vegetable intake. The odds of low fruit and vegetable intake were greater for preschoolers who shared less than three evening family meals per week (odds ratio = 1.5, β = 0.376, P meal with family every night. Fruit and vegetable intake is related to frequency of family meals in preschool-age children. Educating parents about the potential benefits of frequent shared meals may lead to a higher fruit and vegetable consumption among preschoolers. Future studies should address other factors that likely contribute to eating patterns during the preschool years. © 2018 The British Dietetic Association Ltd.
Thompson, E. David; Bowling, Bethany V.; Markle, Ross E.
2018-02-01
Studies over the last 30 years have considered various factors related to student success in introductory biology courses. While much of the available literature suggests that the best predictors of success in a college course are prior college grade point average (GPA) and class attendance, faculty often require a valuable predictor of success in those courses wherein the majority of students are in the first semester and have no previous record of college GPA or attendance. In this study, we evaluated the efficacy of the ACT Mathematics subject exam and Lawson's Classroom Test of Scientific Reasoning in predicting success in a major's introductory biology course. A logistic regression was utilized to determine the effectiveness of a combination of scientific reasoning (SR) scores and ACT math (ACT-M) scores to predict student success. In summary, we found that the model—with both SR and ACT-M as significant predictors—could be an effective predictor of student success and thus could potentially be useful in practical decision making for the course, such as directing students to support services at an early point in the semester.
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
2016-01-01
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
Analyses of non-fatal accidents in an opencast mine by logistic regression model - a case study.
Onder, Seyhan; Mutlu, Mert
2017-09-01
Accidents cause major damage for both workers and enterprises in the mining industry. To reduce the number of occupational accidents, these incidents should be properly registered and carefully analysed. This study efficiently examines the Aegean Lignite Enterprise (ELI) of Turkish Coal Enterprises (TKI) in Soma between 2006 and 2011, and opencast coal mine occupational accident records were used for statistical analyses. A total of 231 occupational accidents were analysed for this study. The accident records were categorized into seven groups: area, reason, occupation, part of body, age, shift hour and lost days. The SPSS package program was used in this study for logistic regression analyses, which predicted the probability of accidents resulting in greater or less than 3 lost workdays for non-fatal injuries. Social facilities-area of surface installations, workshops and opencast mining areas are the areas with the highest probability for accidents with greater than 3 lost workdays for non-fatal injuries, while the reasons with the highest probability for these types of accidents are transporting and manual handling. Additionally, the model was tested for such reported accidents that occurred in 2012 for the ELI in Soma and estimated the probability of exposure to accidents with lost workdays correctly by 70%.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
International Nuclear Information System (INIS)
Boutilier, J; Chan, T; Lee, T; Craig, T; Sharpe, M
2014-01-01
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time
Energy Technology Data Exchange (ETDEWEB)
Boutilier, J; Chan, T; Lee, T [University of Toronto, Toronto, Ontario (Canada); Craig, T; Sharpe, M [University of Toronto, Toronto, Ontario (Canada); The Princess Margaret Cancer Centre - UHN, Toronto, ON (Canada)
2014-06-15
Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.
Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin
2018-09-01
The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Sepedeh Gholizadeh
2016-07-01
Full Text Available Background:Obesity and hypertension are the most important non-communicable diseases thatin many studies, the prevalence and their risk factors have been performedin each geographic region univariately.Study of factors affecting both obesity and hypertension may have an important role which to be adrressed in this study. Materials &Methods:This cross-sectional study was conducted on 1000 men aged 20-70 living in Bushehr province. Blood pressure was measured three times and the average of them was considered as one of the response variables. Hypertension was defined as systolic blood pressure ≥140 (and-or diastolic blood pressure ≥90 and obesity was defined as body mass index ≥25. Data was analyzed by using multilevel, multivariate logistic regression model by MlwiNsoftware. Results:Intra class correlations in cluster level obtained 33% for high blood pressure and 37% for obesity, so two level model was fitted to data. The prevalence of obesity and hypertension obtained 43.6% (0.95%CI; 40.6-46.5, 29.4% (0.95%CI; 26.6-32.1 respectively. Age, gender, smoking, hyperlipidemia, diabetes, fruit and vegetable consumption and physical activity were the factors affecting blood pressure (p≤0.05. Age, gender, hyperlipidemia, diabetes, fruit and vegetable consumption, physical activity and place of residence are effective on obesity (p≤0.05. Conclusion: The multilevel models with considering levels distribution provide more precise estimates. As regards obesity and hypertension are the major risk factors for cardiovascular disease, by knowing the high-risk groups we can d careful planning to prevention of non-communicable diseases and promotion of society health.
International Nuclear Information System (INIS)
Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin
2014-01-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Mao, Hui-Fen; Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni; Wang, Jye
2016-01-01
Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients' functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan's community-based occupational therapy (OT) service referral based on experts' beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services.
Directory of Open Access Journals (Sweden)
Hon-Yi Shi
Full Text Available BACKGROUND: Since most published articles comparing the performance of artificial neural network (ANN models and logistic regression (LR models for predicting hepatocellular carcinoma (HCC outcomes used only a single dataset, the essential issue of internal validity (reproducibility of the models has not been addressed. The study purposes to validate the use of ANN model for predicting in-hospital mortality in HCC surgery patients in Taiwan and to compare the predictive accuracy of ANN with that of LR model. METHODOLOGY/PRINCIPAL FINDINGS: Patients who underwent a HCC surgery during the period from 1998 to 2009 were included in the study. This study retrospectively compared 1,000 pairs of LR and ANN models based on initial clinical data for 22,926 HCC surgery patients. For each pair of ANN and LR models, the area under the receiver operating characteristic (AUROC curves, Hosmer-Lemeshow (H-L statistics and accuracy rate were calculated and compared using paired T-tests. A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and the relative importance of variables. Compared to the LR models, the ANN models had a better accuracy rate in 97.28% of cases, a better H-L statistic in 41.18% of cases, and a better AUROC curve in 84.67% of cases. Surgeon volume was the most influential (sensitive parameter affecting in-hospital mortality followed by age and lengths of stay. CONCLUSIONS/SIGNIFICANCE: In comparison with the conventional LR model, the ANN model in the study was more accurate in predicting in-hospital mortality and had higher overall performance indices. Further studies of this model may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data.
Directory of Open Access Journals (Sweden)
Abdelfattah M. Selim
2018-03-01
Full Text Available Aim: The present cross-sectional study was conducted to determine the seroprevalence and potential risk factors associated with Bovine viral diarrhea virus (BVDV disease in cattle and buffaloes in Egypt, to model the potential risk factors associated with the disease using logistic regression (LR models, and to fit the best predictive model for the current data. Materials and Methods: A total of 740 blood samples were collected within November 2012-March 2013 from animals aged between 6 months and 3 years. The potential risk factors studied were species, age, sex, and herd location. All serum samples were examined with indirect ELIZA test for antibody detection. Data were analyzed with different statistical approaches such as Chi-square test, odds ratios (OR, univariable, and multivariable LR models. Results: Results revealed a non-significant association between being seropositive with BVDV and all risk factors, except for species of animal. Seroprevalence percentages were 40% and 23% for cattle and buffaloes, respectively. OR for all categories were close to one with the highest OR for cattle relative to buffaloes, which was 2.237. Likelihood ratio tests showed a significant drop of the -2LL from univariable LR to multivariable LR models. Conclusion: There was an evidence of high seroprevalence of BVDV among cattle as compared with buffaloes with the possibility of infection in different age groups of animals. In addition, multivariable LR model was proved to provide more information for association and prediction purposes relative to univariable LR models and Chi-square tests if we have more than one predictor.
Directory of Open Access Journals (Sweden)
W. Yao
2016-06-01
Full Text Available The recent success of deep convolutional neural networks (CNN on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN’s texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Perumal, Vanamail
2014-07-01
To assess reproductive risk factors for anaemia among pregnant women in urban and rural areas of India. The International Institute of Population Sciences, India, carried out third National Family Health Survey in 2005-2006 to estimate a key indicator from a sample of ever-married women in the reproductive age group 15-49 years. Data on various dimensions were collected using a structured questionnaire, and anaemia was measured using a portable HemoCue instrument. Anaemia prevalence among pregnant women was compared between rural and urban areas using chi-square test and odds ratio. Multinomial logistic regression analysis was used to determine risk factors. Anaemia prevalence was assessed among 3355 pregnant women from rural areas and 1962 pregnant women from urban areas. Moderate-to-severe anaemia in rural areas (32.4%) is significantly more common than in urban areas (27.3%) with an excess risk of 30%. Gestational age specific prevalence of anaemia significantly increases in rural areas after 6 months. Pregnancy duration is a significant risk factor in both urban and rural areas. In rural areas, increasing age at marriage and mass media exposure are significant protective factors of anaemia. However, more births in the last five years, alcohol consumption and smoking habits are significant risk factors. In rural areas, various reproductive factors and lifestyle characteristics constitute significant risk factors for moderate-to-severe anaemia. Therefore, intensive health education on reproductive practices and the impact of lifestyle characteristics are warranted to reduce anaemia prevalence. © 2014 John Wiley & Sons Ltd.
Yao, W.; Poleswki, P.; Krzystek, P.
2016-06-01
The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.
2009-04-01
In recent years, interest in landslide hazard assessment studies has increased substantially. They are appropriate for evaluation and mitigation plan development in landslide-prone areas. There are several techniques available for landslide hazard research at a regional scale. Generally, they can be classified in two groups: qualitative and quantitative methods. Most of qualitative methods tend to be subjective, since they depend on expert opinions and represent hazard levels in descriptive terms. On the other hand, quantitative methods are objective and they are commonly used due to the correlation between the instability factors and the location of the landslides. Within this group, statistical approaches and new heuristic techniques based on artificial intelligence (artificial neural network (ANN), fuzzy logic, etc.) provide rigorous analysis to assess landslide hazard over large regions. However, they depend on qualitative and quantitative data, scale, types of movements and characteristic factors used. We analysed and compared an approach for assessing earthquake-triggered landslides hazard using logistic regression (LR) and artificial neural networks (ANN) with a back-propagation learning algorithm. One application has been developed in El Salvador, a country of Central America where the earthquake-triggered landslides are usual phenomena. In a first phase, we analysed the susceptibility and hazard associated to the seismic scenario of the 2001 January 13th earthquake. We calibrated the models using data from the landslide inventory for this scenario. These analyses require input variables representing physical parameters to contribute to the initiation of slope instability, for example, slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness, while the occurrence or non-occurrence of landslides is considered as dependent variable. The results of the landslide susceptibility analysis are checked using landslide
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...
Mumcu Kucuker, Derya; Baskent, Emin Zeki
2015-01-01
Integration of non-wood forest products (NWFPs) into forest management planning has become an increasingly important issue in forestry over the last decade. Among NWFPs, mushrooms are valued due to their medicinal, commercial, high nutritional and recreational importance. Commercial mushroom harvesting also provides important income to local dwellers and contributes to the economic value of regional forests. Sustainable management of these products at the regional scale requires information on their locations in diverse forest settings and the ability to predict and map their spatial distributions over the landscape. This study focuses on modeling the spatial distribution of commercially harvested Lactarius deliciosus and L. salmonicolor mushrooms in the Kızılcasu Forest Planning Unit, Turkey. The best models were developed based on topographic, climatic and stand characteristics, separately through logistic regression analysis using SPSS™. The best topographic model provided better classification success (69.3 %) than the best climatic (65.4 %) and stand (65 %) models. However, the overall best model, with 73 % overall classification success, used a mix of several variables. The best models were integrated into an Arc/Info GIS program to create spatial distribution maps of L. deliciosus and L. salmonicolor in the planning area. Our approach may be useful to predict the occurrence and distribution of other NWFPs and provide a valuable tool for designing silvicultural prescriptions and preparing multiple-use forest management plans.
Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions
International Nuclear Information System (INIS)
Bao Min; Shi Quanlin; Zhang Jiamei
2004-01-01
This paper describes the net peak counts calculating of nuclide 137 Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani
2016-01-01
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN
Das, Iswar; Sahoo, Sashikant; van Westen, Cees; Stein, Alfred; Hack, Robert
2010-02-01
Landslide studies are commonly guided by ground knowledge and field measurements of rock strength and slope failure criteria. With increasing sophistication of GIS-based statistical methods, however, landslide susceptibility studies benefit from the integration of data collected from various sources and methods at different scales. This study presents a logistic regression method for landslide susceptibility mapping and verifies the result by comparing it with the geotechnical-based slope stability probability classification (SSPC) methodology. The study was carried out in a landslide-prone national highway road section in the northern Himalayas, India. Logistic regression model performance was assessed by the receiver operator characteristics (ROC) curve, showing an area under the curve equal to 0.83. Field validation of the SSPC results showed a correspondence of 72% between the high and very high susceptibility classes with present landslide occurrences. A spatial comparison of the two susceptibility maps revealed the significance of the geotechnical-based SSPC method as 90% of the area classified as high and very high susceptible zones by the logistic regression method corresponds to the high and very high class in the SSPC method. On the other hand, only 34% of the area classified as high and very high by the SSPC method falls in the high and very high classes of the logistic regression method. The underestimation by the logistic regression method can be attributed to the generalisation made by the statistical methods, so that a number of slopes existing in critical equilibrium condition might not be classified as high or very high susceptible zones.
Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua
2013-03-01
Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Constantin Anghelache
2011-11-01
Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
Tightness of M-estimators for multiple linear regression in time series
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Bent
We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (Plogistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha
2012-05-01
Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Directory of Open Access Journals (Sweden)
Sun Mi Kim
2018-01-01
Full Text Available Purpose The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD into the image analysis in order to improve the diagnosis of breast cancer. Methods This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS lexicon. We applied and compared two regression methods-stepwise logistic (SL regression and logistic least absolute shrinkage and selection operator (LASSO regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC of the tests. Results Logistic LASSO regression was superior (P<0.05 to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD. However, it was inferior (P<0.05 to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD and the AUC without CDD (0.785 vs. 0.844, P<0.001, but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141. Conclusion Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant
International Nuclear Information System (INIS)
Chan, Yea-Kuang; Tsai, Yu-Ching
2017-01-01
The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.
Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant
Energy Technology Data Exchange (ETDEWEB)
Chan, Yea-Kuang; Tsai, Yu-Ching [Institute of Nuclear Energy Research, Taoyuan City, Taiwan (China). Nuclear Engineering Division
2017-03-15
The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.
Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa
2011-08-01
In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Directory of Open Access Journals (Sweden)
Jason W. Osborne
2012-06-01
Full Text Available Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These outcomes represent important social science lines of research: retention in, or dropout from school, using illicit drugs, underage alcohol consumption, antisocial behavior, purchasing decisions, voting patterns, risky behavior, and so on. The goal of this paper is to briefly lead the reader through the surprisingly simple mathematics that underpins logistic regression: probabilities, odds, odds ratios, and logits. Anyone with spreadsheet software or a scientific calculator can follow along, and in turn, this knowledge can be used to make much more interesting, clear, and accurate presentations of results (especially to non-technical audiences. In particular, I will share an example of an interaction in logistic regression, how it was originally graphed, and how the graph was made substantially more user-friendly by converting the original metric (logits to a more readily interpretable metric (probability through three simple steps.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P machine learning models for characterizing solid breast masses on ultrasound.
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Botha, J.; De Ridder, J.H.; Potgieter, J.C.; Steyn, H.S.; Malan, L.
2013-01-01
A recently proposed model for waist circumference cut points (RPWC), driven by increased blood pressure, was demonstrated in an African population. We therefore aimed to validate the RPWC by comparing the RPWC and the Joint Statement Consensus (JSC) models via Logistic Regression (LR) and Neural Networks (NN) analyses. Urban African gender groups (N=171) were stratified according to the JSC and RPWC cut point models. Ultrasound carotid intima media thickness (CIMT), blood pressure (BP) and fa...
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
International Nuclear Information System (INIS)
Gu Ping; Huang Gang; Han Yuan
2007-01-01
Objective: To assess the diagnostic value of CEA, CA199 and CA50 for colorectal neoplasm by logistic regression and ROC curve. Methods: Serum CEA (with CLIA), CA199 (with ECLIA) and CA50 (with IRMA) levels were measured in 75 patients with colorectal cancer, 35 patients with benign colorectal disorders and 49 controls. The area under the ROC curve (AUC)s of CEA, CA199, CA50 from logistic regression results were compared. Results: In the cancer-benign disorder group, the AUC of CA50 was larger than the AUC of CA199. AUC of combined CEA, CA50 was largest: not only larger than any AUC of CEA, CA50, CA199 alone but also larger than the AUC of the combined three markers (0.875 vs 0.604). In cancer-control group, the AUC of combination of CEA, CA199 and CA50 was larger than any AUC of CEA, CA199 or CA50 alone. Both in the cancer-benign disorder group or cancer-control group, the AUC of CEA was larger than the AUC of CA199 or CA50. Conclusion: CEA is of definite value in the diagnosis of colorectal cancer. For differential diagnosis, the combination of CEA and CA50 can give more information, while the combination of three tumor markers is less helpful. As an advanced statistical method, logistic regression can improve the diagnostic sensitivity and specificity. (authors)
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Tools to support interpreting multiple regression in the face of multicollinearity.
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Khalil, Mohamed H; Shebl, Mostafa K; Kosba, Mohamed A; El-Sabrout, Karim; Zaki, Nesma
2016-08-01
This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens' eggs. Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.
A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography
International Nuclear Information System (INIS)
Middleton, G.W.; Thomson, W.H.; Davies, I.H.; Morgan, A.
1989-01-01
A technique for accurate background subtraction in 99 Tc m -DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)
Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.
Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A
2017-02-01
In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).
Lei, Yang; Nollen, Nikki; Ahluwahlia, Jasjit S; Yu, Qing; Mayo, Matthew S
2015-04-09
Other forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers. The data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users. The logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification
Gregory, T.; Sewando, P.
2013-01-01
Adoption of technology is an important factor in economic development. The thrust of this study was to establish factors affecting adoption of QPM technology in Northern zone of Tanzania. Primary data was collected from a random sample of 120 smallholder maize farmers in four villages. Data collected were analysed using descriptive and quantitative methods. Logit model was used to determine factors that influence adoption of QPM technology. The regression results indicated that education of t...
Logistics management for storing multiple cask plug and remote handling systems in ITER
International Nuclear Information System (INIS)
Ventura, Rodrigo; Ferreira, João; Filip, Iulian; Vale, Alberto
2013-01-01
Highlights: ► We model the logistics management problem in ITER, taking into account casks of multiple typologies. ► We propose a method to determine the best position of the casks inside a given storage area. ► Our method obtains the sequence of operations required to retrieve or store an arbitrary cask, given its storage place. ► We illustrate our method with simulation results in an example scenario. -- Abstract: During operation, maintenance inside the reactor building at ITER (International Thermonuclear Experimental Reactor) has to be performed by remote handling, due to the presence of activated materials. Maintenance operations involve the transportation and storage of large, heavyweight casks from and to the tokamak building. The transportation is carried out by autonomous vehicles that lift and move beneath these casks. The storage of these casks face several challenges, since (1) the cask storage area is limited in space, and (2) all casks have to be accessible for transportation by the vehicles. In particular, casks in the storage area may block other casks, so that the former has to be moved to a temporary position to give way to the latter. This paper addresses the challenge of managing the logistics of cask storage, where casks may have different typologies. In particular, we propose an approach to (1) determine the best position of the casks inside the storage area, and to (2) obtain the sequence of operations required to retrieve and store an arbitrary cask from/to a given storage place. A combinatorial optimization approach is used to obtain solutions to both these problems. Simulation results illustrate the application of the proposed method to a simple scenario
Logistics management for storing multiple cask plug and remote handling systems in ITER
Energy Technology Data Exchange (ETDEWEB)
Ventura, Rodrigo, E-mail: rodrigo.ventura@isr.ist.utl.pt [Laboratório de Robótica e Sistemas em Engenharia e Ciência – Laboratório Associado, Instituto Superior Técnico, Universidade Técnica de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa (Portugal); Ferreira, João, E-mail: jftferreira@ipfn.ist.utl.pt [Instituto de Plasmas e Fusão Nuclear – Laboratório Associado, Instituto Superior Técnico, Universidade Técnica de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa (Portugal); Filip, Iulian, E-mail: ifilip@gmail.com [Faculty of Mechanical Engineering – Technical University Gheorghe Asachi of Iasi, 61 Dimitrie Mangeron Bldv., Iasi 700050 (Romania); Vale, Alberto, E-mail: avale@ipfn.ist.utl.pt [Instituto de Plasmas e Fusão Nuclear – Laboratório Associado, Instituto Superior Técnico, Universidade Técnica de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa (Portugal)
2013-10-15
Highlights: ► We model the logistics management problem in ITER, taking into account casks of multiple typologies. ► We propose a method to determine the best position of the casks inside a given storage area. ► Our method obtains the sequence of operations required to retrieve or store an arbitrary cask, given its storage place. ► We illustrate our method with simulation results in an example scenario. -- Abstract: During operation, maintenance inside the reactor building at ITER (International Thermonuclear Experimental Reactor) has to be performed by remote handling, due to the presence of activated materials. Maintenance operations involve the transportation and storage of large, heavyweight casks from and to the tokamak building. The transportation is carried out by autonomous vehicles that lift and move beneath these casks. The storage of these casks face several challenges, since (1) the cask storage area is limited in space, and (2) all casks have to be accessible for transportation by the vehicles. In particular, casks in the storage area may block other casks, so that the former has to be moved to a temporary position to give way to the latter. This paper addresses the challenge of managing the logistics of cask storage, where casks may have different typologies. In particular, we propose an approach to (1) determine the best position of the casks inside the storage area, and to (2) obtain the sequence of operations required to retrieve and store an arbitrary cask from/to a given storage place. A combinatorial optimization approach is used to obtain solutions to both these problems. Simulation results illustrate the application of the proposed method to a simple scenario.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity
Martin, David
2008-01-01
This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Application of range-test in multiple linear regression analysis in ...
African Journals Online (AJOL)
Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method
International Nuclear Information System (INIS)
Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui
1991-01-01
A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients
Clinical trials: odds ratios and multiple regression models--why and how to assess them
Sobh, Mohamad; Cleophas, Ton J.; Hadj-Chaib, Amel; Zwinderman, Aeilko H.
2008-01-01
Odds ratios (ORs), unlike chi2 tests, provide direct insight into the strength of the relationship between treatment modalities and treatment effects. Multiple regression models can reduce the data spread due to certain patient characteristics and thus improve the precision of the treatment
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Directory of Open Access Journals (Sweden)
Helen J Mayfield, PhD
2018-05-01
Full Text Available Summary: Background: Leptospirosis is a globally important zoonotic disease, with complex exposure pathways that depend on interactions between human beings, animals, and the environment. Major drivers of outbreaks include flooding, urbanisation, poverty, and agricultural intensification. The intensity of these drivers and their relative importance vary between geographical areas; however, non-spatial regression methods are incapable of capturing the spatial variations. This study aimed to explore the use of geographically weighted logistic regression (GWLR to provide insights into the ecoepidemiology of human leptospirosis in Fiji. Methods: We obtained field data from a cross-sectional community survey done in 2013 in the three main islands of Fiji. A blood sample obtained from each participant (aged 1–90 years was tested for anti-Leptospira antibodies and household locations were recorded using GPS receivers. We used GWLR to quantify the spatial variation in the relative importance of five environmental and sociodemographic covariates (cattle density, distance to river, poverty rate, residential setting [urban or rural], and maximum rainfall in the wettest month on leptospirosis transmission in Fiji. We developed two models, one using GWLR and one with standard logistic regression; for each model, the dependent variable was the presence or absence of anti-Leptospira antibodies. GWLR results were compared with results obtained with standard logistic regression, and used to produce a predictive risk map and maps showing the spatial variation in odds ratios (OR for each covariate. Findings: The dataset contained location information for 2046 participants from 1922 households representing 81 communities. The Aikaike information criterion value of the GWLR model was 1935·2 compared with 1254·2 for the standard logistic regression model, indicating that the GWLR model was more efficient. Both models produced similar OR for the covariates, but
Hossain, Md Golam; Saw, Aik; Alam, Rashidul; Ohtsuki, Fumio; Kamarul, Tunku
2013-09-01
Cephalic index (CI), the ratio of head breadth to head length, is widely used to categorise human populations. The aim of this study was to access the impact of anthropometric measurements on the CI of male Japanese university students. This study included 1,215 male university students from Tokyo and Kyoto, selected using convenient sampling. Multiple regression analysis was used to determine the effect of anthropometric measurements on CI. The variance inflation factor (VIF) showed no evidence of a multicollinearity problem among independent variables. The coefficients of the regression line demonstrated a significant positive relationship between CI and minimum frontal breadth (p regression analysis showed a greater likelihood for minimum frontal breadth (p regression analysis revealed bizygomatic breadth, head circumference, minimum frontal breadth, head height and morphological facial height to be the best predictor craniofacial measurements with respect to CI. The results suggest that most of the variables considered in this study appear to influence the CI of adult male Japanese students.
Directory of Open Access Journals (Sweden)
Gregory, T.
2013-06-01
Full Text Available Adoption of technology is an important factor in economic development. The thrust of this study was to establish factors affecting adoption of QPM technology in Northern zone of Tanzania. Primary data was collected from a random sample of 120 smallholder maize farmers in four villages. Data collected were analysed using descriptive and quantitative methods. Logit model was used to determine factors that influence adoption of QPM technology. The regression results indicated that education of the household head, farmers’ participation on demonstration trials, attendance to field days, and numbers of livestock owned have positively influenced the rate of adoption of the technology. Access to credit, and poor QPM marketing problem perception by farmers negatively influenced the rate of adoption. The study recommended government to ensure efficiency input-output linkage for QPM production.
Li, Yanming; Nan, Bin; Zhu, Ji
2015-06-01
We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Varga Csaba
2012-10-01
Full Text Available Abstract Background Identifying risk factors for Salmonella Enteritidis (SE infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68 and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94, after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors.
Directory of Open Access Journals (Sweden)
Dieu Tien Bui
2016-04-01
Full Text Available The Cat Ba National Park area (Vietnam with its tropical forest is recognized as being part of the world biodiversity conservation by the United Nations Educational, Scientific and Cultural Organization (UNESCO and is a well-known destination for tourists, with around 500,000 travelers per year. This area has been the site for many research projects; however, no project has been carried out for forest fire susceptibility assessment. Thus, protection of the forest including fire prevention is one of the main concerns of the local authorities. This work aims to produce a tropical forest fire susceptibility map for the Cat Ba National Park area, which may be helpful for the local authorities in forest fire protection management. To obtain this purpose, first, historical forest fires and related factors were collected from various sources to construct a GIS database. Then, a forest fire susceptibility model was developed using Kernel logistic regression. The quality of the model was assessed using the Receiver Operating Characteristic (ROC curve, area under the ROC curve (AUC, and five statistical evaluation measures. The usability of the resulting model is further compared with a benchmark model, the support vector machine (SVM. The results show that the Kernel logistic regression model has a high level of performance in both the training and validation dataset, with a prediction capability of 92.2%. Since the Kernel logistic regression model outperforms the benchmark model, we conclude that the proposed model is a promising alternative tool that should also be considered for forest fire susceptibility mapping in other areas. The results of this study are useful for the local authorities in forest planning and management.
Multiple regression models for energy use in air-conditioned office buildings in different climates
International Nuclear Information System (INIS)
Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.
2010-01-01
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model
International Nuclear Information System (INIS)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-01-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic
Energy Technology Data Exchange (ETDEWEB)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu [R& D, Safety Science Research, Kao Corporation, Tochigi (Japan); Yoshinari, Kouichi [Department of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka (Japan); Honda, Hiroshi, E-mail: honda.hiroshi@kao.co.jp [R& D, Safety Science Research, Kao Corporation, Tochigi (Japan)
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic
Directory of Open Access Journals (Sweden)
BUDIMAN
2012-01-01
Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.
Sintering equation: determination of its coefficients by experiments - using multiple regression
International Nuclear Information System (INIS)
Windelberg, D.
1999-01-01
Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)
Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.
2012-01-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…
On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.
Pek, Jolynn; Chalmers, R Philip; Monette, Georges
2016-01-01
When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.
VNM: An R Package for Finding Multiple-Objective Optimal Designs for the 4-Parameter Logistic Model
Hyun, Seung Won; Wong, Weng Kee; Yang, Yarong
2018-01-01
A multiple-objective optimal design is useful for dose-response studies because it can incorporate several objectives at the design stage. Objectives can be of varying interests and a properly constructed multiple-objective optimal design can provide user-specified efficiencies, delivering higher efficiencies for the more important objectives. In this work, we introduce the VNM package written in R for finding 3-objective locally optimal designs for the 4-parameter logistic (4PL) model widely...
Mass estimation of loose parts in nuclear power plant based on multiple regression
International Nuclear Information System (INIS)
He, Yuanfeng; Cao, Yanlong; Yang, Jiangxin; Gan, Chunbiao
2012-01-01
According to the application of the Hilbert–Huang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal Hilbert–Huang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)
Dynamic Optimization for IPS2 Resource Allocation Based on Improved Fuzzy Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Maokuan Zheng
2017-01-01
Full Text Available The study mainly focuses on resource allocation optimization for industrial product-service systems (IPS2. The development of IPS2 leads to sustainable economy by introducing cooperative mechanisms apart from commodity transaction. The randomness and fluctuation of service requests from customers lead to the volatility of IPS2 resource utilization ratio. Three basic rules for resource allocation optimization are put forward to improve system operation efficiency and cut unnecessary costs. An approach based on fuzzy multiple linear regression (FMLR is developed, which integrates the strength and concision of multiple linear regression in data fitting and factor analysis and the merit of fuzzy theory in dealing with uncertain or vague problems, which helps reduce those costs caused by unnecessary resource transfer. The iteration mechanism is introduced in the FMLR algorithm to improve forecasting accuracy. A case study of human resource allocation optimization in construction machinery industry is implemented to test and verify the proposed model.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
Directory of Open Access Journals (Sweden)
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
Single image super-resolution using locally adaptive multiple linear regression.
Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki
2015-12-01
This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature.
User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)
Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.
2009-01-01
Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-03-15
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY
Chayalakshmi C.L
2018-01-01
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...
International Nuclear Information System (INIS)
Mamikonyan, S.V.; Berezkin, V.V.; Lyubimova, S.V.; Svetajlo, Yu.N.; Shchekin, K.I.
1978-01-01
A method to derive multiple regression equations for X-ray radiometric analysis is described. Te method is realized in the form of the REGRA program in an algorithmic language. The subprograms included in the program are describe. In analyzing cement for Mg, Al, Si, Ca and Fe contents as an example, the obtainment of working equations in the course of calculations by the program is shown to simpliy the realization of computing devices in instruments for X-ray radiometric analysis
[Multiple linear regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis].
Ma, Yu-Feng; Wang, Qing-Fu; Chen, Zhao-Jun; Du, Chun-Lin; Li, Jun-Hai; Huang, Hu; Shi, Zong-Ting; Yin, Yue-Shan; Zhang, Lei; A-Di, Li-Jiang; Dong, Shi-Yu; Wu, Ji
2012-05-01
To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. From March 2011 to July 2011, 140 patients (250 knees) were reviewed, including 132 knees in the left and 118 knees in the right; ranging in age from 40 to 71 years, with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle, joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (Pregression equation of lateral X-ray value and WOMAC scores (P>0.05). 1) X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. 2) It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. 3) The correlation between tibial angle,joint gap angle on antero-posterior X-ray and WOMAC scores is significant, which can be used to assess the functional recovery of patients before and after treatment.
Directory of Open Access Journals (Sweden)
Yoonsu Shin
2016-01-01
Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.
Multiple regression analysis of Jominy hardenability data for boron treated steels
International Nuclear Information System (INIS)
Komenda, J.; Sandstroem, R.; Tukiainen, M.
1997-01-01
The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)
Lampe, Kerstin; Hofmann, Erik
2014-01-01
The article analyzes the influence of company-, industry- and market-related variables on the cost of capital of logistics service providers, as well as on their systematic risk. Financial information has become more and more important in strategic decision making (especially in the international context); in addition of being a measure of performance, the cost of capital is an important variable for logistics service providers in decisions about investing capital and developing the appropria...
Poullis, Michael
2014-11-01
EuroSCORE II, despite improving on the original EuroSCORE system, has not solved all the calibration and predictability issues. Recursive, non-linear and mixed recursive and non-linear regression analysis were assessed with regard to sensitivity, specificity and predictability of the original EuroSCORE and EuroSCORE II systems. The original logistic EuroSCORE, EuroSCORE II and recursive, non-linear and mixed recursive and non-linear regression analyses of these risk models were assessed via receiver operator characteristic curves (ROC) and Hosmer-Lemeshow statistic analysis with regard to the accuracy of predicting in-hospital mortality. Analysis was performed for isolated coronary artery bypass grafts (CABGs) (n = 2913), aortic valve replacement (AVR) (n = 814), mitral valve surgery (n = 340), combined AVR and CABG (n = 517), aortic (n = 350), miscellaneous cases (n = 642), and combinations of the above cases (n = 5576). The original EuroSCORE had an ROC below 0.7 for isolated AVR and combined AVR and CABG. None of the methods described increased the ROC above 0.7. The EuroSCORE II risk model had an ROC below 0.7 for isolated AVR only. Recursive regression, non-linear regression, and mixed recursive and non-linear regression all increased the ROC above 0.7 for isolated AVR. The original EuroSCORE had a Hosmer-Lemeshow statistic that was above 0.05 for all patients and the subgroups analysed. All of the techniques markedly increased the Hosmer-Lemeshow statistic. The EuroSCORE II risk model had a Hosmer-Lemeshow statistic that was significant for all patients (P linear regression failed to improve on the original Hosmer-Lemeshow statistic. The mixed recursive and non-linear regression using the EuroSCORE II risk model was the only model that produced an ROC of 0.7 or above for all patients and procedures and had a Hosmer-Lemeshow statistic that was highly non-significant. The original EuroSCORE and the EuroSCORE II risk models do not have adequate ROC and Hosmer
hMuLab: A Biomedical Hybrid MUlti-LABel Classifier Based on Multiple Linear Regression.
Wang, Pu; Ge, Ruiquan; Xiao, Xuan; Zhou, Manli; Zhou, Fengfeng
2017-01-01
Many biomedical classification problems are multi-label by nature, e.g., a gene involved in a variety of functions and a patient with multiple diseases. The majority of existing classification algorithms assumes each sample with only one class label, and the multi-label classification problem remains to be a challenge for biomedical researchers. This study proposes a novel multi-label learning algorithm, hMuLab, by integrating both feature-based and neighbor-based similarity scores. The multiple linear regression modeling techniques make hMuLab capable of producing multiple label assignments for a query sample. The comparison results over six commonly-used multi-label performance measurements suggest that hMuLab performs accurately and stably for the biomedical datasets, and may serve as a complement to the existing literature.
Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro
2017-09-01
To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Mayfield, Helen J; Lowry, John H; Watson, Conall H; Kama, Mike; Nilles, Eric J; Lau, Colleen L
2018-05-01
Leptospirosis is a globally important zoonotic disease, with complex exposure pathways that depend on interactions between human beings, animals, and the environment. Major drivers of outbreaks include flooding, urbanisation, poverty, and agricultural intensification. The intensity of these drivers and their relative importance vary between geographical areas; however, non-spatial regression methods are incapable of capturing the spatial variations. This study aimed to explore the use of geographically weighted logistic regression (GWLR) to provide insights into the ecoepidemiology of human leptospirosis in Fiji. We obtained field data from a cross-sectional community survey done in 2013 in the three main islands of Fiji. A blood sample obtained from each participant (aged 1-90 years) was tested for anti-Leptospira antibodies and household locations were recorded using GPS receivers. We used GWLR to quantify the spatial variation in the relative importance of five environmental and sociodemographic covariates (cattle density, distance to river, poverty rate, residential setting [urban or rural], and maximum rainfall in the wettest month) on leptospirosis transmission in Fiji. We developed two models, one using GWLR and one with standard logistic regression; for each model, the dependent variable was the presence or absence of anti-Leptospira antibodies. GWLR results were compared with results obtained with standard logistic regression, and used to produce a predictive risk map and maps showing the spatial variation in odds ratios (OR) for each covariate. The dataset contained location information for 2046 participants from 1922 households representing 81 communities. The Aikaike information criterion value of the GWLR model was 1935·2 compared with 1254·2 for the standard logistic regression model, indicating that the GWLR model was more efficient. Both models produced similar OR for the covariates, but GWLR also detected spatial variation in the effect of each
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2
Rossi, M.; Apuani, T.; Felletti, F.
2009-04-01
The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9
Ozdemir, Adnan; Altural, Tolga
2013-03-01
This study evaluated and compared landslide susceptibility maps produced with three different methods, frequency ratio, weights of evidence, and logistic regression, by using validation datasets. The field surveys performed as part of this investigation mapped the locations of 90 landslides that had been identified in the Sultan Mountains of south-western Turkey. The landslide influence parameters used for this study are geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transportation capacity index, distance to drainage, distance to fault, drainage density, fault density, and spring density maps. The relationships between landslide distributions and these parameters were analysed using the three methods, and the results of these methods were then used to calculate the landslide susceptibility of the entire study area. The accuracy of the final landslide susceptibility maps was evaluated based on the landslides observed during the fieldwork, and the accuracy of the models was evaluated by calculating each model's relative operating characteristic curve. The predictive capability of each model was determined from the area under the relative operating characteristic curve and the areas under the curves obtained using the frequency ratio, logistic regression, and weights of evidence methods are 0.976, 0.952, and 0.937, respectively. These results indicate that the frequency ratio and weights of evidence models are relatively good estimators of landslide susceptibility in the study area. Specifically, the results of the correlation analysis show a high correlation between the frequency ratio and weights of evidence results, and the frequency ratio and logistic regression methods exhibit correlation coefficients of 0.771 and 0.727, respectively. The frequency ratio model is simple, and its input, calculation and output processes are
Linear and logistic regression analysis
Tripepi, G.; Jager, K. J.; Dekker, F. W.; Zoccali, C.
2008-01-01
In previous articles of this series, we focused on relative risks and odds ratios as measures of effect to assess the relationship between exposure to risk factors and clinical outcomes and on control for confounding. In randomized clinical trials, the random allocation of patients is hoped to
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Research on the multiple linear regression in non-invasive blood glucose measurement.
Zhu, Jianming; Chen, Zhencheng