Hilbe, Joseph M
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
.... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Sofro, A.; Oktaviarina, A.
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
Rosa Arboretti Giancristofaro
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Reed, Phil; Wu, Yaqionq
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Horssen, P.W. van; Pebesma, E.J.; Schot, P.P.
This paper presents a method to assess the uncertainty of an ecological spatial prediction model which is based on logistic regression models, using data from the interpolation of explanatory predictor variables. The spatial predictions are presented as approximate 95% prediction intervals. The
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
A binary logistic regression model with complex sampling design of unmet need for family planning among all women aged (15-49) in Ethiopia. ... Conclusion: The key determinants of unmet need family planning in Ethiopia were residence, age, marital-status, education, household members, birth-events and number of ...
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Hosmer, David W; Sturdivant, Rodney X
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Qiu, Hong; Yu, Ignatius Tak-Sun; Tse, Lap Ah; Wang, Xiao-rong; Fu, Zhen-ming
Rothman argued that interaction estimated as departure from additivity better reflected the biological interaction. In a logistic regression model, the product term reflects the interaction as departure from multiplicativity. So far, literature on estimating interaction regarding an additive scale using logistic regression was only focusing on two dichotomous factors. The objective of the present report was to provide a method to examine the interaction as departure from additivity between two continuous variables or between one continuous variable and one categorical variable. We used data from a lung cancer case-control study among males in Hong Kong as an example to illustrate the bootstrap re-sampling method for calculating the corresponding confidence intervals. Free software R (Version 2.8.1) was used to estimate interaction on the additive scale.
A mixed-effects multinomial logistic regression model is described for analysis of clustered or longitudinal nominal or ordinal response data. The model is parameterized to allow flexibility in the choice of contrasts used to represent comparisons across the response categories. Estimation is achieved using a maximum marginal likelihood (MML) solution that uses quadrature to numerically integrate over the distribution of random effects. An analysis of a psychiatric data set, in which homeless adults with serious mental illness are repeatedly classified in terms of their living arrangement, is used to illustrate features of the model. Copyright 2003 by John Wiley & Sons, Ltd.
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Vaeth, Michael; Skovlund, Eva
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Sep 3, 2017 ... SPSS-21. Binary logistic regression with complex sam- pling design was fitted for the unmet need outcomes. Married women are disaggregated by various background characteristics to have an insight of their characteristics. All background characteristics of women used in this study were categorical ...
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.
Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S
A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.
McCormick, Tyler H; Raftery, Adrian E; Madigan, David; Burd, Randall S
We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a "forgetting" factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality. © 2011, The International Biometric Society.
Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Clausel, M.; Grégoire, G.
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.
Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.
Logistic regression techniques can be used to restrict the conditional probabilities of a Bayesian network for discrete variables. More specifically, each variable of the network can be modeled through a logistic regression model, in which the parents of the variable define the covariates. When all
Jones, Jeff A; Waller, Niels G
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tan, Qihua; Bathum, L; Christiansen, L
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
By, Kunthel; Qaqish, Bahjat F; Preisser, John S; Perin, Jamie; Zink, Richard C
This article describes a new software for modeling correlated binary data based on orthogonalized residuals, a recently developed estimating equations approach that includes, as a special case, alternating logistic regressions. The software is flexible with respect to fitting in that the user can choose estimating equations for association models based on alternating logistic regressions or orthogonalized residuals, the latter choice providing a non-diagonal working covariance matrix for second moment parameters providing potentially greater efficiency. Regression diagnostics based on this method are also implemented in the software. The mathematical background is briefly reviewed and the software is applied to medical data sets. Published by Elsevier Ireland Ltd.
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Rijmen, F.P.J.; Vomlel, J.
We present a variational estimation method for the mixed logistic regression model. The method is based on a lower bound approximation of the logistic function [Jaakkola, J.S. and Jordan, M.I., 2000, Bayesian parameter estimation via variational methods. Statistics Computing, 10, 25-37.]. Based on
Harrell , Jr , Frank E
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Nikbakht, Roya; Bahrampour, Abbas
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from bird atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from brrd atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Parameshwar V. Pandit
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Susan L. King
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Petersen, Jørgen Holm
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...
Al-Daffaie, Kadhem; Khan, Shahjahan
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which
Full Text Available Abstract Background Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Methods Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM were compared. The “do-calculus” was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Results Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Buonaccorsi, John P; Romeo, Giovanni; Thoresen, Magne
When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods. © 2017, The International Biometric Society.
Rijmen, F.; Vomlel, Jiří
Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008
Lin, Y.P.; Chu, H.J.; Wu, C.F.; Verburg, P.H.
The objective of this study is to compare the abilities of logistic, auto-logistic and artificial neural network (ANN) models for quantifying the relationships between land uses and their drivers. In addition, the application of the results obtained by the three techniques is tested in a dynamic
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.
Lin, Y.P.; Chu, H.J.; Wu, C.F.; Verburg, P.H.
The objective of this study is to compare the abilities of logistic, auto-logistic and artificial neural network (ANN) models for quantifying the relationships between land uses and their drivers. In addition, the application of the results obtained by the three techniques is tested in a dynamic land-use change model (CLUE-s) for the Paochiao watershed region in Taiwan. Relative operating characteristic curves (ROCs), kappa statistics, multiple resolution validation and landscape metrics were...
Bonellie, Sandra R
To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother. Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.
López Puga, Jorge; García García, Juan
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
Koon, Sharon; Petscher, Yaacov
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.
Rausch, Manuel; Zehetleitner, Michael
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available The issue of detecting objects bottoming on the sea floor is significant in various fields including civilian and military areas. The objective of this study is to investigate the logistic regression model to discriminate the target from the clutter and to verify the possibility of applying the model trained by the simulated data generated by the mathematical model to the real experimental data because it is not easy to obtain sufficient data in the underwater field. In the first stage of this study, when the clutter signal energy is so strong that the detection of a target is difficult, the logistic regression model is employed to distinguish the strong clutter signal and the target signal. Previous studies have found that if the clutter energy is larger, false detection occurs even for the various existing detection schemes. For this reason, the discrete Fourier transform (DFT magnitude spectrum of acoustic signals received by active sonar is applied to train the model to distinguish whether the received signal contains a target signal or not. The goodness of fit of the model is verified in terms of receiver operation characteristic (ROC, area under ROC curve (AUC, and classification table. The detection performance of the proposed model is evaluated in terms of detection rate according to target to clutter ratio (TCR. Furthermore, the real experimental data are employed to test the proposed approach. When using the experimental data to test the model, the logistic regression model is trained by the simulated data that are generated based on the mathematical model for the backscattering of the cylindrical object. The mathematical model is developed according to the size of the cylinder used in the experiment. Since the information on the experimental environment including the sound speed, the sediment type and such is not available, once simulated data are generated under various conditions, valid simulated data are selected using 70% of the
Zhang, Sheng; Wang, Bo; Wan, Lin; Li, Lei M
Phred quality scores are essential for downstream DNA analysis such as SNP detection and DNA assembly. Thus a valid model to define them is indispensable for any base-calling software. Recently, we developed the base-caller 3Dec for Illumina sequencing platforms, which reduces base-calling errors by 44-69% compared to the existing ones. However, the model to predict its quality scores has not been fully investigated yet. In this study, we used logistic regression models to evaluate quality scores from predictive features, which include different aspects of the sequencing signals as well as local DNA contents. Sparse models were further obtained by three methods: the backward deletion with either AIC or BIC and the L 1 regularization learning method. The L 1 -regularized one was then compared with the Illumina scoring method. The L 1 -regularized logistic regression improves the empirical discrimination power by as large as 14 and 25% respectively for two kinds of preprocessed sequencing signals, compared to the Illumina scoring method. Namely, the L 1 method identifies more base calls of high fidelity. Computationally, the L 1 method can handle large dataset and is efficient enough for daily sequencing. Meanwhile, the logistic model resulted from BIC is more interpretable. The modeling suggested that the most prominent quenching pattern in the current chemistry of Illumina occurred at the dinucleotide "GT". Besides, nucleotides were more likely to be miscalled as the previous bases if the preceding ones were not "G". It suggested that the phasing effect of bases after "G" was somewhat different from those after other nucleotide types.
Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.
Gelman, Andrew; Jakulin, Aleks; Pittau, Maria Grazia; Su, Yu-Sung
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-$t$ prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-ha...
Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)
Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
Wulandari, S. P.; Salamah, M.; Rositawati, A. F. D.
Food security is the condition where the food fulfilment is managed well for the country till the individual. Indonesia is one of the country which has the commitment to create the food security becomes main priority. However, the food necessity becomes common thing means that it doesn’t care about nutrient standard and the health condition of family member, so in the fulfilment of food necessity also has to consider the disease suffered by the family member, one of them is pulmonary tuberculosa. From that reasons, this research is conducted to know the factors which influence on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya by using binary logistic regression method. The analysis result by using binary logistic regression shows that the variables wife latest education, house density and spacious house ventilation significantly affect on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya, where the wife education level is University/equivalent, the house density is eligible or 8 m2/person and spacious house ventilation 10% of the floor area has the opportunity to become food secure households amounted to 0.911089. While the chance of becoming food insecure households amounted to 0.088911. The model household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya has been conformable, and the overall percentages of those classifications are at 71.8%.
Full Text Available Using a generalized linear model to determine the claim frequency of auto insurance is a key ingredient in non-life insurance research. Among auto insurance rate-making models, there are very few considering auto types. Therefore, in this paper we are proposing a model that takes auto types into account by making an innovative use of the auto burden index. Based on this model and data from a Chinese insurance company, we built a clustering model that classifies auto insurance rates into three risk levels. The claim frequency and the claim costs are fitted to select a better loss distribution. Then the Logistic Regression model is employed to fit the claim frequency, with the auto burden index considered. Three key findings can be concluded from our study. First, more than 80% of the autos with an auto burden index of 20 or higher belong to the highest risk level. Secondly, the claim frequency is better fitted using the Poisson distribution, however the claim cost is better fitted using the Gamma distribution. Lastly, based on the AIC criterion, the claim frequency is more adequately represented by models that consider the auto burden index than those do not. It is believed that insurance policy recommendations that are based on Generalized linear models (GLM can benefit from our findings.
Eke, Gemma; Holttum, Sue; Hayward, Mark
Previous research highlights barriers to clinical psychologists conducting research, but has rarely examined U.K. clinical psychologists. The study investigated U.K. clinical psychologists' self-reported research output and tested part of a theoretical model of factors influencing their intention to conduct research. Questionnaires were mailed to 1,300 U.K. clinical psychologists. Three hundred and seventy-four questionnaires were returned (29% response-rate). This study replicated in a U.K. sample the finding that the modal number of publications was zero, highlighted in a number of U.K. and U.S. studies. Research intention was bimodally distributed, and logistic regression classified 78% of cases successfully. Outcome expectations, perceived behavioral control and normative beliefs mediated between research training environment and intention. Further research should explore how research is negotiated in clinical roles, and this issue should be incorporated into prequalification training. © 2012 Wiley Periodicals, Inc.
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Full Text Available Abstract Background Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates. Methods In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC is examined. Results The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group. Conclusion We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence
Wang, Bin; Wang, Xiao-Feng; Howell, Paul; Qian, Xuemin; Huang, Kun; Riker, Adam I; Ju, Jingfang; Xi, Yaguang
MicroRNA (miRNA) is a set of newly discovered non-coding small RNA molecules. Its significant effects have contributed to a number of critical biological events including cell proliferation, apoptosis development, as well as tumorigenesis. High-dimensional genomic discovery platforms (e.g. microarray) have been employed to evaluate the important roles of miRNAs by analyzing their expression profiling. However, because of the small total number of miRNAs and the absence of well-known endogenous controls, the traditional normalization methods for messenger RNA (mRNA) profiling analysis could not offer a suitable solution for miRNA analysis. The need for the establishment of new adaptive methods has come to the forefront. Locked nucleic acid (LNA)-based miRNA array was employed to profile miRNAs using colorectal cancer cell lines under different treatments. The expression pattern of overall miRNA profiling was pre-evaluated by a panel of miRNAs using Taqman-based quantitative real-time polymerase chain reaction (qRT-PCR) miRNA assays. A logistic regression model was built based on qRT-PCR results and then applied to the normalization of miRNA array data. The expression levels of 20 additional miRNAs selected from the normalized list were post-validated. Compared with other popularly used normalization methods, the logistic regression model efficiently calibrates the variance across arrays and improves miRNA microarray discovery accuracy. Datasets and R package are available at http://gauss.usouthal.edu/publ/logit/.
Rubin, Maria Laura; Chan, Wenyaw; Yamal, Jose-Miguel; Robertson, Claudia Sue
The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross-sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6-month outcome based on physiological data collected post-injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long-term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model
Steyerberg Ewout W
Full Text Available Abstract Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI enrolled in eight Randomized Controlled Trials (RCTs and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4, Stata (GLLAMM, SAS (GLIMMIX and NLMIXED, MLwiN ([R]IGLS and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC, R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal models for the main study and when based on a relatively large number of level-1 (patient level data compared to the number of level-2 (hospital level data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in
Abdolmaleki, P.; Yarmohammadi, M.; Gity, M.
Background: We designed an algorithmic model based on regression analysis and a non-algorithmic model based on the Artificial Neural Network. Materials and methods: The ability of these models was compared together in clinical application to differentiate malignant from benign breast tumors in a study group of 161 patient's records. Each patient's record consisted of 6 subjective features extracted from MRI appearance. These findings were enclosed as features extracted for an Artificial Neural Network as well as a logistic regression model to predict biopsy outcome. After both models had been trained perfectly on samples (n=100), the validation samples (n=61) were presented to the trained network as well as the established logistic regression models. Finally, the diagnostic performance of models were compared to the that of the radiologist in terms of sensitivity, specificity and accuracy, using receiver operating characteristic curve analysis. Results: The average out put of the Artificial Neural Network yielded a perfect sensitivity (98%) and high accuracy (90%) similar to that one of an expert radiologist (96% and 92%) while specificity was smaller than that (67%) verses 80%). The output of the logistic regression model using significant features showed improvement in specificity from 60% for the logistic regression model using all features to 93% for the reduced logistic regression model, keeping the accuracy around 90%. Conclusion: Results show that Artificial Neural Network and logistic regression model prove the relationship between extracted morphological features and biopsy results. Using statistically significant variables reduced logistic regression model outperformed of Artificial Neural Network with remarkable specificity while keeping high sensitivity is achieved
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim U.; Emberton, Mark [University College London, Research Department of Urology, Division of Surgery and Interventional Science, London (United Kingdom); Kirkham, Alex [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangeable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. (orig.)
Lalonde, Trent L; Wilson, Jeffrey R; Yin, Jianqiong
When analyzing longitudinal data, it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. As such one can analyze these data using generalized estimating equation with the independent working correlation. However, because it is essential to include all the appropriate moment conditions as you solve for the regression coefficients, we explore an alternative approach using a generalized method of moments for estimating the coefficients in such data. We develop an approach that makes use of all the valid moment conditions necessary with each time-dependent and time-independent covariate. This approach does not assume that feedback is always present over time, or if present occur at the same degree. Further, we make use of continuously updating generalized method of moments in obtaining estimates. We fit the generalized method of moments logistic regression model with time-dependent covariates using SAS PROC IML and also in R. We used p-values adjusted for multiple correlated tests to determine the appropriate moment conditions for determining the regression coefficients. We examined two datasets for illustrative purposes. We looked at re-hospitalization taken from a Medicare database. We also revisited data regarding the relationship between the body mass index and future morbidity among children in the Philippines. We conducted a simulated study to compare the performances of extended classifications. Copyright © 2014 John Wiley & Sons, Ltd.
Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd
Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.
Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.
Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
Escribano Mesa, José Alberto; Alonso Morillejo, Enrique; Parrón Carreño, Tesifón; Huete Allut, Antonio; Narro Donate, José María; Méndez Román, Paddy; Contreras Jiménez, Ascensión; Pedrero García, Francisco; Masegosa González, José
Parasagittal meningiomas arise from the arachnoid cells of the angle formed between the superior sagittal sinus (SSS) and the brain convexity. In this retrospective study, we focused on factors that predict early recurrence and recurrence times. We reviewed 125 patients with parasagittal meningiomas operated from 1985 to 2014. We studied the following variables: age, sex, location, laterality, histology, surgeons, invasion of the SSS, Simpson removal grade, follow-up time, angiography, embolization, radiotherapy, recurrence and recurrence time, reoperation, neurologic deficit, degree of dependency, and patient status at the end of follow-up. Patients ranged in age from 26 to 81 years (mean 57.86 years; median 60 years). There were 44 men (35.2%) and 81 women (64.8%). There were 57 patients with neurologic deficits (45.2%). The most common presenting symptom was motor deficit. World Health Organization grade I tumors were identified in 104 patients (84.6%), and the majority were the meningothelial type. Recurrence was detected in 34 cases. Time of recurrence was 9 to 336 months (mean: 84.4 months; median: 79.5 months). Male sex was identified as an independent risk for recurrence with relative risk 2.7 (95% confidence interval 1.21-6.15), P = 0.014. Kaplan-Meier curves for recurrence had statistically significant differences depending on sex, age, histologic type, and World Health Organization histologic grade. A binary logistic regression was made with the Hosmer-Lemeshow test with P > 0.05; sex, tumor size, and histologic type were used in this model. Male sex is an independent risk factor for recurrence that, associated with other factors such tumor size and histologic type, explains 74.5% of all cases in a binary regression model. Copyright © 2017 Elsevier Inc. All rights reserved.
Abreu, Mery Natali Silva; Siqueira, Arminda Lucia; Caiaffa, Waleska Teixeira
Ordinal logistic regression models have been developed for analysis of epidemiological studies. However, the adequacy of such models for adjustment has so far received little attention. In this article, we reviewed the most important ordinal regression models and common approaches used to verify goodness-of-fit, using R or Stata programs. We performed formal and graphical analyses to compare ordinal models using data sets on health conditions from the National Health and Nutrition Examination Survey (NHANES II).
Sze, N N; Wong, S C; Lee, C Y
In past several decades, many countries have set quantified road safety targets to motivate transport authorities to develop systematic road safety strategies and measures and facilitate the achievement of continuous road safety improvement. Studies have been conducted to evaluate the association between the setting of quantified road safety targets and road fatality reduction, in both the short and long run, by comparing road fatalities before and after the implementation of a quantified road safety target. However, not much work has been done to evaluate whether the quantified road safety targets are actually achieved. In this study, we used a binary logistic regression model to examine the factors - including vehicle ownership, fatality rate, and national income, in addition to level of ambition and duration of target - that contribute to a target's success. We analyzed 55 quantified road safety targets set by 29 countries from 1981 to 2009, and the results indicate that targets that are in progress and with lower level of ambitions had a higher likelihood of eventually being achieved. Moreover, possible interaction effects on the association between level of ambition and the likelihood of success are also revealed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Chiu, Long S.
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Dikaios, Nikolaos; Alkalbani, Jokha; Abd-Alazeez, Mohamed; Sidhu, Harbir Singh; Kirkham, Alex; Ahmed, Hashim U; Emberton, Mark; Freeman, Alex; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit
To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. • The ADC and T2-nSI of benign/cancer PZ are higher than benign/cancer TZ. • DCE parameters are significantly different between benign PZ and TZ, but not between cancerous PZ and TZ. • Diagnostic models containing contrast enhancement parameters have reduced performance when applied across zones.
B. Li (Bayoue); B. Roozenbeek (Bob); E.W. Steyerberg (Ewout); E.M.E.H. Lesaffre (Emmanuel)
textabstractBackground: Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods. We used individual patient data from 8509
P.S. Paul [Indian School of Mines University, Dhanbad (India). Department of Mining Engineering
Mine accidents and injuries are complex and generally characterized by several factors starting from personal to technical, and technical to social characteristics. In this study, an attempt has been made to identify the various factors responsible for work related injuries in mines and to estimate the risk of work injury to mine workers. The prediction of work injury in mines was done by a step-by-step multivariate logistic regression modeling with an application to case study mines in India. In total, 18 variables were considered in this study. Most of the variables are not directly quantifiable. Instruments were developed to quantify them through a questionnaire type survey. Underground mine workers were randomly selected for the survey. Responses from 300 participants were used for the analysis. Four variables, age, negative affectivity, job dissatisfaction, and physical hazards bear significant discriminating power for risk of injury to the workers, comparing between cases and controls in a multivariate situation while controlling all the personal and socio-technical variables. The analysis reveals that negatively affected workers are 2.54 times more prone to injuries than the less negatively affected workers and this factor is a more important risk factor for the case-study mines. Long term planning through identification of the negative individuals, proper counseling regarding the adverse effects of negative behaviors and special training is urgently required. Care should be taken for the aged and experienced workers in terms of their job responsibility and training requirements. Management should provide a friendly atmosphere during work to increase the confidence of the injury prone miners. 44 refs., 4 tabs.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam [Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org (Malaysia)
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Staskiewicz, Grzegorz; Czekajska-Chehab, Elżbieta; Uhlig, Sebastian; Przegalinski, Jerzy; Maciejewski, Ryszard; Drop, Andrzej
Purpose: Diagnosis of right ventricular dysfunction in patients with acute pulmonary embolism (PE) is known to be associated with increased risk of mortality. The aim of the study was to calculate a logistic regression model for reliable identification of right ventricular dysfunction (RVD) in patients diagnosed with computed tomography pulmonary angiography. Material and methods: Ninety-seven consecutive patients with acute pulmonary embolism were divided into groups with and without RVD basing upon echocardiographic measurement of pulmonary artery systolic pressure (PASP). PE severity was graded with the pulmonary obstruction score. CT measurements of heart chambers and mediastinal vessels were performed; position of interventricular septum and presence of contrast reflux into the inferior vena cava were also recorded. The logistic regression model was prepared by means of stepwise logistic regression. Results: Among the used parameters, the final model consisted of pulmonary obstruction score, short axis diameter of right ventricle and diameter of inferior vena cava. The calculated model is characterized by 79% sensitivity and 81% specificity, and its performance was significantly better than single CT-based measurements. Conclusion: Logistic regression model identifies RVD significantly better, than single CT-based measurements
LESAFFRE, Emmanuel; Mwalili, Samuel M.; Declerck, Dominique
We present an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms. The different scoring behaviour of the 16 examiners complicated the identification of a geographical trend in a recent study on caries experience in Flemish children (Belgium) who were 7 years old. Since the measurement error is on the response the factor 'examiner' could be included in the regression mode...
Dineva, S.; Prodanova, K.; Mlachkova, D.
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
Full Text Available A knowledge gap exists in the role of restaurant type on the prediction of attaining the highest grade possible from the local health inspection agency. This study identified disparities using logistic regression between the issuance of a Grade A and restaurant type and location. This study tested the eight most inspected types of restaurants within the City of New York and calculated the odds ratios of their receiving the highest inspection grade by the New York City Department of Health and Mental Hygiene. A fitted equation has been proposed for the prediction of receiving the highest inspection grade based upon the citywide results of these eight restaurant types from calendar years 2011 and 2012. The results suggest that certain styles of restaurants have lower odds of receiving the highest grade in comparison to American-style restaurants.
Zhou, Lim Yi; Shan, Fam Pei; Shimizu, Kunio; Imoto, Tomoaki; Lateh, Habibah; Peng, Koay Swee
A comparative study of logistic regression, support vector machine (SVM) and least square support vector machine (LSSVM) models has been done to predict the slope failure (landslide) along East-West Highway (Gerik-Jeli). The effects of two monsoon seasons (southwest and northeast) that occur in Malaysia are considered in this study. Two related factors of occurrence of slope failure are included in this study: rainfall and underground water. For each method, two predictive models are constructed, namely SOUTHWEST and NORTHEAST models. Based on the results obtained from logistic regression models, two factors (rainfall and underground water level) contribute to the occurrence of slope failure. The accuracies of the three statistical models for two monsoon seasons are verified by using Relative Operating Characteristics curves. The validation results showed that all models produced prediction of high accuracy. For the results of SVM and LSSVM, the models using RBF kernel showed better prediction compared to the models using linear kernel. The comparative results showed that, for SOUTHWEST models, three statistical models have relatively similar performance. For NORTHEAST models, logistic regression has the best predictive efficiency whereas the SVM model has the second best predictive efficiency.
Full Text Available We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an [Formula: see text] regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.
Bastos, Leonardo Soares; Oliveira, Raquel de Vasconcellos Carvalhaes de; Velasque, Luciane de Souza
In the last decades, the use of the epidemiological prevalence ratio (PR) instead of the odds ratio has been debated as a measure of association in cross-sectional studies. This article addresses the main difficulties in the use of statistical models for the calculation of PR: convergence problems, availability of tools and inappropriate assumptions. We implement the direct approach to estimate the PR from binary regression models based on two methods proposed by Wilcosky & Chambless and compare with different methods. We used three examples and compared the crude and adjusted estimate of PR, with the estimates obtained by use of log-binomial, Poisson regression and the prevalence odds ratio (POR). PRs obtained from the direct approach resulted in values close enough to those obtained by log-binomial and Poisson, while the POR overestimated the PR. The model implemented here showed the following advantages: no numerical instability; assumes adequate probability distribution and, is available through the R statistical package.
Ulkhaq, M. M.; Widodo, A. K.; Yulianto, M. F. A.; Widhiyaningrum; Mustikasari, A.; Akshinta, P. Y.
The implementation of renewable energy in this globalization era is inevitable since the non-renewable energy leads to climate change and global warming; hence, it does harm the environment and human life. However, in the developing countries, such as Indonesia, the implementation of the renewable energy sources does face technical and social problems. For the latter, renewable energy sources implementation is only effective if the public is aware of its benefits. This research tried to identify the determinants that influence consumers’ intention in adopting renewable energy sources. In addition, this research also tried to predict the consumers who are willing to apply the renewable energy sources in their houses using a logistic regression approach. A case study was conducted in Semarang, Indonesia. The result showed that only eight variables (from fifteen) that are significant statistically, i.e., educational background, employment status, income per month, average electricity cost per month, certainty about the efficiency of renewable energy project, relatives’ influence to adopt the renewable energy sources, energy tax deduction, and the condition of the price of the non-renewable energy sources. The finding of this study could be used as a basis for the government to set up a policy towards an implementation of the renewable energy sources.
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Wu, Yazhou; Zhang, Ling; Liu, Ling; Zhang, Yanqi; Zhao, Zengwei; Liu, Xiaoyu; Yi, Dong
Analysis of interactions between genes and the environment with complex multifactorial human disease faces important challenges. Limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes or environmental exposures are key problems. The aim of the study was to investigate the use of multifactor dimensionality reduction (MDR) and logistic regression models to analyze the effects of interactions between complex disease genes with other genes and with environmental factors and to compare the results of these two methods in interaction analysis. In this case-control study, the two methods were applied to analog data of samples from 486 cancer patients and 514 control individuals by computer simulation, including 4 environment factors (E1~E4) and 8 gene polymorphism factors (G1~G8). Non-conditional logistic regression was used to analyze risk factors for cancer, and MDR and logistic regression were employed to analyze interactions under various conditions. MDR could find high-level interactions between genes and the environment (E3*G1*G7), but it could not find a main effect; conversely, logistic regression better analyzed the main effects (E3, G1, and G4) but was limited in its analysis of high-level interactions (E3*G1*G7). The results of these two methods with analog data show that the gene G1 site, the G4 site, E3, and the E3*G1*G7 interaction may be risk factors for occurrence of cancer. MDR and logistic regression, which are the two complementary methods, can be combined to analyze gene-gene (gene-environment) interactions with good results. This approach should help to determine the causes of diseases, such as chronic non-transmittable diseases like cancer.
Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee
The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Austin, Peter C
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Azen, Razia; Traxel, Nicole
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
In this article, we modeled Higher Education Loans Board (HELB) loan application data from three public universities to determine whether the loan was ... It is expected that proper determination of the most accurate model will go a long way in minimizing the number of mis-classifications when awarding HELB loan.
We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Regenbogen, Scott E; Sinha, Debajyoti; Ibrahim, Joseph G; Gawande, Atul A
The proportional odds logistic regression model is widely used for relating an ordinal outcome to a set of covariates. When the number of outcome categories is relatively large, the sample size is relatively small, and/or certain outcome categories are rare, maximum likelihood can yield biased estimates of the regression parameters. Firth (1993) and Kosmidis and Firth (2009) proposed a procedure to remove the leading term in the asymptotic bias of the maximum likelihood estimator. Their approach is most easily implemented for univariate outcomes. In this paper, we derive a bias correction that exploits the proportionality between Poisson and multinomial likelihoods for multinomial regression models. Specifically, we describe a bias correction for the proportional odds logistic regression model, based on the likelihood from a collection of independent Poisson random variables whose means are constrained to sum to 1, that is straightforward to implement. The proposed method is motivated by a study of predictors of post-operative complications in patients undergoing colon or rectal surgery (Gawande et al., 2007).
Huang, Ying; Pepe, Margaret S; Feng, Ziding
Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the non-diseased population before including them in the logistic regression model. Among the advantages of this method are: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker datasets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case-control sampling designs. The dataset motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. Estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 dataset.
Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei
Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.
Xu, Jun-Fang; Xu, Jing; Li, Shi-Zhu; Jia, Tia-Wu; Huang, Xi-Bao; Zhang, Hua-Ming; Chen, Mei; Yang, Guo-Jing; Gao, Shu-Jing; Wang, Qing-Yun; Zhou, Xiao-Nong
Background The transmission of schistosomiasis japonica in a local setting is still poorly understood in the lake regions of the People's Republic of China (P. R. China), and its transmission patterns are closely related to human, social and economic factors. Methodology/Principal Findings We aimed to apply the integrated approach of artificial neural network (ANN) and logistic regression model in assessment of transmission risks of Schistosoma japonicum with epidemiological data collected from 2339 villagers from 1247 households in six villages of Jiangling County, P.R. China. By using the back-propagation (BP) of the ANN model, 16 factors out of 27 factors were screened, and the top five factors ranked by the absolute value of mean impact value (MIV) were mainly related to human behavior, i.e. integration of water contact history and infection history, family with past infection, history of water contact, infection history, and infection times. The top five factors screened by the logistic regression model were mainly related to the social economics, i.e. village level, economic conditions of family, age group, education level, and infection times. The risk of human infection with S. japonicum is higher in the population who are at age 15 or younger, or with lower education, or with the higher infection rate of the village, or with poor family, and in the population with more than one time to be infected. Conclusion/Significance Both BP artificial neural network and logistic regression model established in a small scale suggested that individual behavior and socioeconomic status are the most important risk factors in the transmission of schistosomiasis japonica. It was reviewed that the young population (≤15) in higher-risk areas was the main target to be intervened for the disease transmission control. PMID:23556015
Xu, Jun-Fang; Xu, Jing; Li, Shi-Zhu; Jia, Tia-Wu; Huang, Xi-Bao; Zhang, Hua-Ming; Chen, Mei; Yang, Guo-Jing; Gao, Shu-Jing; Wang, Qing-Yun; Zhou, Xiao-Nong
The transmission of schistosomiasis japonica in a local setting is still poorly understood in the lake regions of the People's Republic of China (P. R. China), and its transmission patterns are closely related to human, social and economic factors. We aimed to apply the integrated approach of artificial neural network (ANN) and logistic regression model in assessment of transmission risks of Schistosoma japonicum with epidemiological data collected from 2339 villagers from 1247 households in six villages of Jiangling County, P.R. China. By using the back-propagation (BP) of the ANN model, 16 factors out of 27 factors were screened, and the top five factors ranked by the absolute value of mean impact value (MIV) were mainly related to human behavior, i.e. integration of water contact history and infection history, family with past infection, history of water contact, infection history, and infection times. The top five factors screened by the logistic regression model were mainly related to the social economics, i.e. village level, economic conditions of family, age group, education level, and infection times. The risk of human infection with S. japonicum is higher in the population who are at age 15 or younger, or with lower education, or with the higher infection rate of the village, or with poor family, and in the population with more than one time to be infected. Both BP artificial neural network and logistic regression model established in a small scale suggested that individual behavior and socioeconomic status are the most important risk factors in the transmission of schistosomiasis japonica. It was reviewed that the young population (≤15) in higher-risk areas was the main target to be intervened for the disease transmission control.
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Full Text Available This study develops a model for evaluating the hazard level of landslides at Alishan Forestry Railway, Taiwan, by using logistic regression with the assistance of a geographical information system (GIS. A typhoon event-induced landslide inventory, independent variables, and a triggering factor were used to build the model. The environmental factors such as bedrock lithology from the geology database; topographic aspect, terrain roughness, profile curvature, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images were used as variables that influence landslide occurrence. The area under curve (AUC of a receiver operator characteristic (ROC curve was used to validate the model. Effects of parameters on landslide occurrence were assessed from the corresponding coefficient that appears in the logistic regression function. Thereafter, the model was applied to predict the probability of landslides for rainfall data of different return periods. Using a predicted map of probability, the study area was classified into four ranks of landslide susceptibility: low, medium, high, and very high. As a result, most high susceptibility areas are located on the western portion of the study area. Several train stations and railways are located on sites with a high susceptibility ranking.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Conclusion: Differentiation between borderline and invasive ovarian tumors can be achieved using a model based on the following criteria: menopausal status; cancer antigen 125 level; and ultrasound parameters. The model is helpful to oncologists and patients in the initial evaluation phase of ovarian tumors.
Gupta, Mithun Das; Huang, Thomas S.
In this work we investigate the relationship between Bregman distances and regularized Logistic Regression model. We present a detailed study of Bregman Distance minimization, a family of generalized entropy measures associated with convex functions. We convert the L1-regularized logistic regression into this more general framework and propose a primal-dual method based algorithm for learning the parameters. We pose L1-regularized logistic regression into Bregman distance minimization and the...
Full Text Available For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML method is better than Penalized Quasilikelihood (PQL method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.
Ali, Sabz; Ali, Amjad; Khan, Sajjad Ahmad; Hussain, Sundas
For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML) method is better than Penalized Quasilikelihood (PQL) method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.
Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar
The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
Ali, Asad; Zaidi, Farrah; Fatima, Syeda Hira; Adnan, Muhammad; Ullah, Saleem
In this study, we propose to develop a geostatistical computational framework to model the distribution of rat bite infestation of epidemic proportion in Peshawar valley, Pakistan. Two species Rattus norvegicus and Rattus rattus are suspected to spread the infestation. The framework combines strengths of maximum entropy algorithm and binomial kriging with logistic regression to spatially model the distribution of infestation and to determine the individual role of environmental predictors in modeling the distribution trends. Our results demonstrate the significance of a number of social and environmental factors in rat infestations such as (I) high human population density; (II) greater dispersal ability of rodents due to the availability of better connectivity routes such as roads, and (III) temperature and precipitation influencing rodent fecundity and life cycle.
Schlögel, R.; Marchesini, I.; Alvioli, M.; Reichenbach, P.; Rossi, M.; Malet, J.-P.
We perform landslide susceptibility zonation with slope units using three digital elevation models (DEMs) of varying spatial resolution of the Ubaye Valley (South French Alps). In so doing, we applied a recently developed algorithm automating slope unit delineation, given a number of parameters, in order to optimize simultaneously the partitioning of the terrain and the performance of a logistic regression susceptibility model. The method allowed us to obtain optimal slope units for each available DEM spatial resolution. For each resolution, we studied the susceptibility model performance by analyzing in detail the relevance of the conditioning variables. The analysis is based on landslide morphology data, considering either the whole landslide or only the source area outline as inputs. The procedure allowed us to select the most useful information, in terms of DEM spatial resolution, thematic variables and landslide inventory, in order to obtain the most reliable slope unit-based landslide susceptibility assessment.
Notario del Pino, Jesús S.; Ruiz-Gallardo, José-Reyes
Treatments that minimize soil erosion after large wildfires depend, among other factors, on fire severity and landscape configuration so that, in practice, most of them are applied according to emergency criteria. Therefore, simple tools to predict soil erosion risk help to decide where the available resources should be used first. In this study, a predictive model for soil erosion degree, based on ordinal logistic regression, has been developed and evaluated using data from three large forest fires in South-eastern Spain. The field data were successfully fit to the model in 60% of cases after 50 runs (i.e., agreement between observed and predicted soil erosion degrees), using slope steepness, slope aspect, and fire severity as predictors. North-facing slopes were shown to be less prone to soil erosion than the rest.
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
Full Text Available Background: Spontaneous Reporting Systems (SRSs are passive systems composed of reports of suspected Adverse Drug Events (ADEs, and are used for Pharmacovigilance (PhV, namely, drug safety surveillance. Exploration of analytical methodologies to enhance SRS-based discovery will contribute to more effective PhV. In this study, we proposed a statistical modeling approach for SRS data to address heterogeneity by a reporting time point. Furthermore, we applied this approach to analyze ADEs of incretin-based drugs such as DPP-4 inhibitors and GLP-1 receptor agonists, which are widely used to treat type 2 diabetes. Methods: SRS data were obtained from the Japanese Adverse Drug Event Report (JADER database. Reported adverse events were classified according to the MedDRA High Level Terms (HLTs. A mixed effects logistic regression model was used to analyze the occurrence of each HLT. The model treated DPP-4 inhibitors, GLP-1 receptor agonists, hypoglycemic drugs, concomitant suspected drugs, age, and sex as fixed effects, while the quarterly period of reporting was treated as a random effect. Before application of the model, Fisher’s exact tests were performed for all drug-HLT combinations. Mixed effects logistic regressions were performed for the HLTs that were found to be associated with incretin-based drugs. Statistical significance was determined by a two-sided p-value <0.01 or a 99% two-sided confidence interval. Finally, the models with and without the random effect were compared based on Akaike’s Information Criteria (AIC, in which a model with a smaller AIC was considered satisfactory. Results: The analysis included 187,181 cases reported from January 2010 to March 2015. It showed that 33 HLTs, including pancreatic, gastrointestinal, and cholecystic events, were significantly associated with DPP-4 inhibitors or GLP-1 receptor agonists. In the AIC comparison, half of the HLTs reported with incretin-based drugs favored the random effect
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
Full Text Available Based on logistic regression (LR and artificial neural network (ANN methods, we construct an LR model, an ANN model and three types of a two-stage hybrid model. The two-stage hybrid model is integrated by the LR and ANN approaches. We predict the credit risk of China’s small and medium-sized enterprises (SMEs for financial institutions (FIs in the supply chain financing (SCF by applying the above models. In the empirical analysis, the quarterly financial and non-financial data of 77 listed SMEs and 11 listed core enterprises (CEs in the period of 2012–2013 are chosen as the samples. The empirical results show that: (i the “negative signal” prediction accuracy ratio of the ANN model is better than that of LR model; (ii the two-stage hybrid model type I has a better performance of predicting “positive signals” than that of the ANN model; (iii the two-stage hybrid model type II has a stronger ability both in aspects of predicting “positive signals” and “negative signals” than that of the two-stage hybrid model type I; and (iv “negative signal” predictive power of the two-stage hybrid model type III is stronger than that of the two-stage hybrid model type II. In summary, the two-stage hybrid model III has the best classification capability to forecast SMEs credit risk in SCF, which can be a useful prediction tool for China’s FIs.
Adwere-Boamah, Joseph; Hufstedler, Shirley
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua
Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240
Nematollahi, M; Akbari, R; Nikeghbalian, S; Salehnasab, C
Kidney transplantation is the treatment of choice for patients with end-stage renal disease (ESRD). Prediction of the transplant survival is of paramount importance. The objective of this study was to develop a model for predicting survival in kidney transplant recipients. In a cross-sectional study, 717 patients with ESRD admitted to Nemazee Hospital during 2008-2012 for renal transplantation were studied and the transplant survival was predicted for 5 years. The multilayer perceptron of artificial neural networks (MLP-ANN), logistic regression (LR), Support Vector Machine (SVM), and evaluation tools were used to verify the determinant models of the predictions and determine the independent predictors. The accuracy, area under curve (AUC), sensitivity, and specificity of SVM, MLP-ANN, and LR models were 90.4%, 86.5%, 98.2%, and 49.6%; 85.9%, 76.9%, 97.3%, and 26.1%; and 84.7%, 77.4%, 97.5%, and 17.4%, respectively. Meanwhile, the independent predictors were discharge time creatinine level, recipient age, donor age, donor blood group, cause of ESRD, recipient hypertension after transplantation, and duration of dialysis before transplantation. SVM and MLP-ANN models could efficiently be used for determining survival prediction in kidney transplant recipients.
Duwe, Grant; Freske, Pamela J
This study presents the results from efforts to revise the Minnesota Sex Offender Screening Tool-Revised (MnSOST-R), one of the most widely used sex offender risk-assessment tools. The updated instrument, the MnSOST-3, contains nine individual items, six of which are new. The population for this study consisted of the cross-validation sample for the MnSOST-R (N = 220) and a contemporary sample of 2,315 sex offenders released from Minnesota prisons between 2003 and 2006. To score and select items for the MnSOST-3, we used predicted probabilities generated from a multiple logistic regression model. We used bootstrap resampling to not only refine our selection of predictors but also internally validate the model. The results indicate the MnSOST-3 has a relatively high level of predictive discrimination, as evidenced by an apparent AUC of .821 and an optimism-corrected AUC of .796. The findings show the MnSOST-3 is well calibrated with actual recidivism rates for all but the highest risk offenders. Although estimating a penalized maximum likelihood model did not improve the overall calibration, the results suggest the MnSOST-3 may still be useful in helping identify high-risk offenders whose sexual recidivism risk exceeds 50%. Results from an interrater reliability assessment indicate the instrument, which is scored in a Microsoft Excel application, has an adequate degree of consistency across raters (ICC = .83 for both consistency and absolute agreement).
Eekhout, Iris; van de Wiel, Mark A; Heymans, Martijn W
Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
Abdelfattah M. Selim
Full Text Available Aim: The present cross-sectional study was conducted to determine the seroprevalence and potential risk factors associated with Bovine viral diarrhea virus (BVDV disease in cattle and buffaloes in Egypt, to model the potential risk factors associated with the disease using logistic regression (LR models, and to fit the best predictive model for the current data. Materials and Methods: A total of 740 blood samples were collected within November 2012-March 2013 from animals aged between 6 months and 3 years. The potential risk factors studied were species, age, sex, and herd location. All serum samples were examined with indirect ELIZA test for antibody detection. Data were analyzed with different statistical approaches such as Chi-square test, odds ratios (OR, univariable, and multivariable LR models. Results: Results revealed a non-significant association between being seropositive with BVDV and all risk factors, except for species of animal. Seroprevalence percentages were 40% and 23% for cattle and buffaloes, respectively. OR for all categories were close to one with the highest OR for cattle relative to buffaloes, which was 2.237. Likelihood ratio tests showed a significant drop of the -2LL from univariable LR to multivariable LR models. Conclusion: There was an evidence of high seroprevalence of BVDV among cattle as compared with buffaloes with the possibility of infection in different age groups of animals. In addition, multivariable LR model was proved to provide more information for association and prediction purposes relative to univariable LR models and Chi-square tests if we have more than one predictor.
Full Text Available Severe fever with thrombocytopenia syndrome (SFTS is caused by severe fever with thrombocytopenia syndrome virus (SFTSV, which has had a serious impact on public health in parts of Asia. There is no specific antiviral drug or vaccine for SFTSV and, therefore, it is important to determine the factors that influence the occurrence of SFTSV infections. This study aimed to explore the spatial associations between SFTSV infections and several potential determinants, and to predict the high-risk areas in mainland China. The analysis was carried out at the level of provinces in mainland China. The potential explanatory variables that were investigated consisted of meteorological factors (average temperature, average monthly precipitation and average relative humidity, the average proportion of rural population and the average proportion of primary industries over three years (2010–2012. We constructed a geographically weighted logistic regression (GWLR model in order to explore the associations between the selected variables and confirmed cases of SFTSV. The study showed that: (1 meteorological factors have a strong influence on the SFTSV cover; (2 a GWLR model is suitable for exploring SFTSV cover in mainland China; (3 our findings can be used for predicting high-risk areas and highlighting when meteorological factors pose a risk in order to aid in the implementation of public health strategies.
Kleinbaum, David G
This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.
Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V
From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).
Full Text Available Background:Obesity and hypertension are the most important non-communicable diseases thatin many studies, the prevalence and their risk factors have been performedin each geographic region univariately.Study of factors affecting both obesity and hypertension may have an important role which to be adrressed in this study. Materials &Methods:This cross-sectional study was conducted on 1000 men aged 20-70 living in Bushehr province. Blood pressure was measured three times and the average of them was considered as one of the response variables. Hypertension was defined as systolic blood pressure ≥140 (and-or diastolic blood pressure ≥90 and obesity was defined as body mass index ≥25. Data was analyzed by using multilevel, multivariate logistic regression model by MlwiNsoftware. Results:Intra class correlations in cluster level obtained 33% for high blood pressure and 37% for obesity, so two level model was fitted to data. The prevalence of obesity and hypertension obtained 43.6% (0.95%CI; 40.6-46.5, 29.4% (0.95%CI; 26.6-32.1 respectively. Age, gender, smoking, hyperlipidemia, diabetes, fruit and vegetable consumption and physical activity were the factors affecting blood pressure (p≤0.05. Age, gender, hyperlipidemia, diabetes, fruit and vegetable consumption, physical activity and place of residence are effective on obesity (p≤0.05. Conclusion: The multilevel models with considering levels distribution provide more precise estimates. As regards obesity and hypertension are the major risk factors for cardiovascular disease, by knowing the high-risk groups we can d careful planning to prevention of non-communicable diseases and promotion of society health.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (Plogistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Noel Antonio Sánchez Trujillo
Full Text Available This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered; we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.
Goeman, Jelle J
This paper presents autocorrelated logistic ridge regression, an extension of logistic ridge regression for ordered covariates that is based on the assumption that adjacent covariates have similar regression coefficients. The method is applied to the analysis of proteomics mass spectra.
Hu, Bo; Palta, Mari; Shao, Jun
Various R(2) statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates. We study the asymptotic properties of three popular variance-based R(2) statistics. We find that two variance-based R(2) statistics, the sum of squares and the squared Pearson correlation, have identical asymptotic distribution whereas the third one, Gini's concentration measure, has a different asymptotic behaviour and may overstate the predictivity of the model and covariates when the model is mis-specified. Our result not only provides a theoretical basis for the findings in previous empirical and numerical work, but also leads to asymptotic confidence intervals. Statistical variability can then be taken into account when assessing the predictive value of a logistic regression model.
So, Tak-Shing Harry; Peng, Chao-Ying Joanne
This study compared the accuracy of predicting two-group membership obtained from K-means clustering with those derived from linear probability modeling, linear discriminant function, and logistic regression under various data properties. Multivariate normally distributed populations were simulated based on combinations of population proportions,…
P.C. Austin (Peter); E.W. Steyerberg (Ewout)
textabstractBackground: When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods. An analytical expression was derived under the assumption that a
Chiu, Long S.; Kedem, Benjamin
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Slats, P.A.; Bhola, B.; Evers, J.J.M.; Dijkhuizen, G.
Logistic chain modelling is very important in improving the overall performance of the total logistic chain. Logistic models provide support for a large range of applications, such as analysing bottlenecks, improving customer service, configuring new logistic chains and adapting existing chains to
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians
Williams David Keith
Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
van der Net, Jeroen B.; Janssens, A. Cecile J. W.; Eijkemans, Marinus J. C.; Kastelein, John J. P.; Sijbrands, Eric J. G.; Steyerberg, Ewout W.
Cross-sectional genetic association studies can be analyzed using Cox proportional hazards models with age as time scale, if age at onset of disease is known for the cases and age at data collection is known for the controls. We assessed to what degree and under what conditions Cox proportional
Shelley M. ALEXANDER
Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].
State-of-the-art score normalization methods use generative models that rely on sometimes unrealistic assumptions. We propose a novel parameter estimation method for score normalization based on logistic regression. Experiments on the Gov2 and CluewebA collection indicate that our method is
Leng, Xiaoling; Huang, Guofu; Yao, Lanhui; Ma, Fucheng
This study is to investigate the diagnostic role of multi-mode ultrasound in level 4 BI-RADS breast lesions and to establish a Logistic regression model. Totally 179 patients with 182 sites of breast lesions were enrolled in this study. Preoperatively, the examinations of routine ultrasonography, elastography, contrast-enhanced ultrasonography and three-dimensional color Doppler were performed. Postoperatively, the breast lesions were diagnosed as benign and malignant lesions according to pathological results. Diagnostic indicators of each ultrasound analysis were determined and compared. The relationship between these diagnostic indicators and the benign and malignant features of breast lesions was analyzed by single factor analysis. Logistic regression model was established. The diagnostic indicators with high sensitivity and specificity were tumor edge, enhanced range and score of elastography. Four factors of tumor edge, enhanced order, contrast mode and score of elastography were related with the benign and malignant features of breast lesions. The prediction model was Logit (P) = 0.636 + 4.471X1 + 4.337X2 + 3.753X3 + 3.014X4 + 2.525X5 + 2.105X6. Likelihood ratio test showed that the model was statistically significant (χ(2) = 161.876, P R(2) = 0.813, prediction accuracy 92.3%). The differences in sensitivity and specificity between multi-mode ultrasound diagnosis and routine ultrasound diagnosis were statistically significant (P Logistic regression model and multi-mode ultrasound diagnosis. Multi-mode ultrasound and Logistic regression model are more effective in diagnosing level 4 BI-RADS breast lesions.
To assess reproductive risk factors for anaemia among pregnant women in urban and rural areas of India. The International Institute of Population Sciences, India, carried out third National Family Health Survey in 2005-2006 to estimate a key indicator from a sample of ever-married women in the reproductive age group 15-49 years. Data on various dimensions were collected using a structured questionnaire, and anaemia was measured using a portable HemoCue instrument. Anaemia prevalence among pregnant women was compared between rural and urban areas using chi-square test and odds ratio. Multinomial logistic regression analysis was used to determine risk factors. Anaemia prevalence was assessed among 3355 pregnant women from rural areas and 1962 pregnant women from urban areas. Moderate-to-severe anaemia in rural areas (32.4%) is significantly more common than in urban areas (27.3%) with an excess risk of 30%. Gestational age specific prevalence of anaemia significantly increases in rural areas after 6 months. Pregnancy duration is a significant risk factor in both urban and rural areas. In rural areas, increasing age at marriage and mass media exposure are significant protective factors of anaemia. However, more births in the last five years, alcohol consumption and smoking habits are significant risk factors. In rural areas, various reproductive factors and lifestyle characteristics constitute significant risk factors for moderate-to-severe anaemia. Therefore, intensive health education on reproductive practices and the impact of lifestyle characteristics are warranted to reduce anaemia prevalence. © 2014 John Wiley & Sons Ltd.
Austin, Peter C; Merlo, Juan
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Peng, Chao-Ying Joanne; Lee, Kuk Lida; Ingersoll, Gary M.
Provides guidelines for what to expect in an article using logistic regression techniques, discussing tables, figures, and charts to be included to comprehensively assess results and assumptions to be verified; demonstrating the preferred pattern for applying logistic methods, with an illustration of logistic regression applied to a data set; and…
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The
Discriminating parathyroid adenoma from local mimics by using inherent tissue attenuation and vascular information obtained with four-dimensional CT: formulation of a multinomial logistic regression model.
Hunter, George J; Ginat, Daniel T; Kelly, Hillary R; Halpern, Elkan F; Hamberg, Leena M
To identify a set of parameters, which are based on tissue enhancement and native iodine content obtained from a standardized triple-phase four-dimensional (4D) computed tomographic (CT) scan, that define a multinomial logistic regression model that discriminates between parathyroid adenoma (PTA) and thyroid nodules or lymph nodes. Informed consent was waived by the institutional review board for this retrospective HIPAA-compliant study. Electronic medical records were reviewed for 102 patients with hyperparathyroidism who underwent triple-phase 4D CT and parathyroid surgery resulting in pathologically proved removal of adenoma from July 2010 through December 2011. Hounsfield units were measured in PTA, thyroid, lymph nodes, and aorta and were used to determine seven parameters characterizing tissue contrast enhancement. These were used as covariates in 10 multinomial logistic regression models. Three models with one covariate, four models with two covariates, and three models with three covariates were investigated. Receiver operating characteristic (ROC) analysis was performed to determine how well each model discriminated between adenoma and nonadenomatous tissues. Statistical differences between the areas under the ROC curves (AUCs) for each model pair were calculated, as well as sensitivity, specificity, accuracy, negative predictive value, and positive predictive value. A total of 120 lesions were found; 112 (93.3%) lesions were weighed, and mean and median weights were 589 and 335 mg, respectively. The three-covariate models were significantly identical (P > .65), with largest AUC of 0.9913 ± 0.0037 (standard error), accuracy of 96.9%, and sensitivity, specificity, negative predictive value, and positive predictive value of 94.3%, 98.3%, 97.1%, and 96.7%, respectively. The one- and two-covariate models were significantly less accurate (P logistic model derived from a triple-phase 4D CT scan can accurately provide the probability that tissue is PTA and
Iyama, Yuji; Nakaura, Takeshi; Katahira, Kazuhiro; Iyama, Ayumi; Nagayama, Yasunori; Oda, Seitaro; Utsunomiya, Daisuke; Yamashita, Yasuyuki
To develop a prediction model to distinguish between transition zone (TZ) cancers and benign prostatic hyperplasia (BPH) on multi-parametric prostate magnetic resonance imaging (mp-MRI). This retrospective study enrolled 60 patients with either BPH or TZ cancer, who had undergone 3 T-MRI. We generated ten parameters for T2-weighted images (T2WI), diffusion-weighted images (DWI) and dynamic MRI. Using a t-test and multivariate logistic regression (LR) analysis to evaluate the parameters' accuracy, we developed LR models. We calculated the area under the receiver operating characteristic curve (ROC) of LR models by a leave-one-out cross-validation procedure, and the LR model's performance was compared with radiologists' performance with their opinion and with the Prostate Imaging Reporting and Data System (Pi-RADS v2) score. Multivariate LR analysis showed that only standardized T2WI signal and mean apparent diffusion coefficient (ADC) maintained their independent values (P < 0.001). The validation analysis showed that the AUC of the final LR model was comparable to that of board-certified radiologists, and superior to that of Pi-RADS scores. A standardized T2WI and mean ADC were independent factors for distinguishing between BPH and TZ cancer. The performance of the LR model was comparable to that of experienced radiologists. • It is difficult to diagnose transition zone (TZ) cancer. • We performed quantitative image analysis in multi-parametric MRI. • Standardized-T2WI and mean-ADC were independent factors for diagnosing TZ cancer. • We developed logistic-regression analysis to diagnose TZ cancer accurately. • The performance of the logistic-regression analysis was higher than PIRADSv2.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Chen, Chen-Hsin; Tsay, Yuh-Chyuan; Wu, Ya-Chi; Horng, Cheng-Fang
In conventional survival analysis there is an underlying assumption that all study subjects are susceptible to the event. In general, this assumption does not adequately hold when investigating the time to an event other than death. Owing to genetic and/or environmental etiology, study subjects may not be susceptible to the disease. Analyzing nonsusceptibility has become an important topic in biomedical, epidemiological, and sociological research, with recent statistical studies proposing several mixture models for right-censored data in regression analysis. In longitudinal studies, we often encounter left, interval, and right-censored data because of incomplete observations of the time endpoint, as well as possibly left-truncated data arising from the dissimilar entry ages of recruited healthy subjects. To analyze these kinds of incomplete data while accounting for nonsusceptibility and possible crossing hazards in the framework of mixture regression models, we utilize a logistic regression model to specify the probability of susceptibility, and a generalized gamma distribution, or a log-logistic distribution, in the accelerated failure time location-scale regression model to formulate the time to the event. Relative times of the conditional event time distribution for susceptible subjects are extended in the accelerated failure time location-scale submodel. We also construct graphical goodness-of-fit procedures on the basis of the Turnbull-Frydman estimator and newly proposed residuals. Simulation studies were conducted to demonstrate the validity of the proposed estimation procedure. The mixture regression models are illustrated with alcohol abuse data from the Taiwan Aboriginal Study Project and hypertriglyceridemia data from the Cardiovascular Disease Risk Factor Two-township Study in Taiwan. Copyright © 2013 John Wiley & Sons, Ltd.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Nagy, Ivan; Suzdaleva, Evgenia
Roč. 26, č. 5 (2016), s. 417-437 ISSN 1210-0552 R&D Projects: GA ČR GA15-03564S Institutional support: RVO:67985556 Keywords : on-line modeling * on-line logistic regression * recursive mixture estimation * data dependent pointer Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.394, year: 2016 http://library.utia.cas.cz/separaty/2016/ZS/suzdaleva-0464463.pdf
Austin, Peter C; Steyerberg, Ewout W
The change in c-statistic is frequently used to summarize the change in predictive accuracy when a novel risk factor is added to an existing logistic regression model. We explored the relationship between the absolute change in the c-statistic, Brier score, generalized R(2) , and the discrimination slope when a risk factor was added to an existing model in an extensive set of Monte Carlo simulations. The increase in model accuracy due to the inclusion of a novel marker was proportional to both the prevalence of the marker and to the odds ratio relating the marker to the outcome but inversely proportional to the accuracy of the logistic regression model with the marker omitted. We observed greater improvements in model accuracy when the novel risk factor or marker was uncorrelated with the existing predictor variable compared with when the risk factor has a positive correlation with the existing predictor variable. We illustrated these findings by using a study on mortality prediction in patients hospitalized with heart failure. In conclusion, the increase in predictive accuracy by adding a marker should be considered in the context of the accuracy of the initial model. Copyright © 2012 John Wiley & Sons, Ltd.
Nahib, Irmadi; Suryanta, Jaka
Forest destruction, climate change and global warming could reduce an indirect forest benefit because forest is the largest carbon sink and it plays a very important role in global carbon cycle. To support Reducing Emissions from Deforestation and Forest Degradation (REDD +) program, people pay attention of forest cover changes as the basis for calculating carbon stock changes. This study try to explore the forest cover dynamics as well as the prediction model of forest cover in Indragiri Hulu Regency, Riau Province Indonesia. The study aims to analyse some various explanatory variables associated with forest conversion processes and predict forest cover change using logistic regression model (LRM). The main data used in this study is Land use/cover map (1990 - 2011). Performance of developed model was assessed through a comparison of the predicted model of forest cover change and the actual forest cover in 2011. The analysis result showed that forest cover has decreased continuously between 1990 and 2011, up to the loss of 165,284.82 ha (35.19 %) of forest area. The LRM successfully predicted the forest cover for the period 2010 with reasonably high accuracy (ROC = 92.97 % and 70.26 %).
Tsui, Jenkin; Dasylva, Abel; Chu, Kenneth
We propose an optimal estimating equation for logistic regression with linked data while accounting for false positives. It builds on a previous solution but estimates the regression coefficients with a smaller variance, in large samples.
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Full Text Available Purpose: Application of multinomial logistic regression for diagnostic support of acute abdominal pain, a diagnostic problem with many differential diagnoses. Methods: The analysis is based on a prospective data base with 2280 patients with acute abdominal pain, characterized by 87 variables from history and clinical examination and 12 differential diagnoses. Associations between single variables from history and clinical examination and the final diagnoses were investigated with multinomial logistic regression. Results: Exemplarily, the results are presented for the variable rigidity. A statistical significant association was observed for generalized rigidity and the diagnoses appendicitis, bowel obstruction, pancreatitis, perforated ulcer, multiple and other diagnoses and for localized rigidity and appendicitis, diverticulitis, biliary disease and perforated ulcer. Diagnostic profiles were generated by summarizing the statistical significant associations. As an example the diagnostic profile of acute appendicitis is presented. Conclusions: Compared to alternative approaches (e.g. independent Bayes, loglinear model there are advantages for multinomial logistic regression to support complex differential diagnostic problems, provided potential traps are avoided (e.g. α-error, interpretation of odds ratio.
Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege
We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...
Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose
Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...
Full Text Available Abstract – Human resources (HR is one of the success factors in the economic field, namely how to create a human resources (HR qualified and have the skills and highly competitive in the global competition. Educational level of the labor force that is still relatively low. The structure of education of the workforce is still dominated Indonesian basic education which is about 63.2%. The issue raised is to determine the probability of a program of study (whether or not to see some of the ratio of the number of graduates by the number of students per class, the amount of quota size class (large or small using logistic regression models. Data were obtained from a search result based on the amount of data the study program students and graduates in 2010 Data processing using SPSS. The results of the analysis by assessing model fit and the results will be given for each model fit. Starting with the hypothesis for assessing model fit, statistical -2LogL, Cox and Snell's R Square, Hosmer and Lemeshow's Goodness of Fit Test, and the classification table. The results of the analysis using SPSS as a tool aimed at measuring quality of graduate courses at a university, college, or academy, whether or not based on the ratio of the number of graduates and class quotas. Keywords: Quota Class, Probability, Logistic Regression Abstrak – Sumberdaya manusia (SDM adalah salah satu faktor kesuksesan dalam bidang ekonomi, yaitu bagaimana menciptakan sumber daya manusia (SDM yang berkualitas dan memiliki keterampilan serta berdaya saing tinggi dalam persaingan global. Tingkat pendidikan angkatan kerja yang ada masih relatif rendah. Struktur pendidikan angkatan kerja Indonesia masih didominasi pendidikan dasar yaitu sekitar 63,2%. Persoalan yang dikemukakan adalah menentukan probabilitas sebuah program studi (baik atau tidak dengan melihat beberapa rasio jumlah lulusan dengan jumlah mahasiswa per angkatan, ukuran besarnya kuota kelas (besar atau kecil menggunakan
Zuhdi, Shaifudin; Saputro, Dewi Retno Sari
GWOLR model used for represent relationship between dependent variable has categories and scale of category is ordinal with independent variable influenced the geographical location of the observation site. Parameters estimation of GWOLR model use maximum likelihood provide system of nonlinear equations and hard to be found the result in analytic resolution. By finishing it, it means determine the maximum completion, this thing associated with optimizing problem. The completion nonlinear system of equations optimize use numerical approximation, which one is Newton Raphson method. The purpose of this research is to make iteration algorithm Newton Raphson and program using R software to estimate GWOLR model. Based on the research obtained that program in R can be used to estimate the parameters of GWOLR model by forming a syntax program with command "while".
Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.
Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.
Alexandrescu, Roxana; Bottle, Alex; Jarman, Brian; Aylin, Paul
The use of hierarchical logistic regression for provider profiling has been recommended due to the clustering of patients within hospitals, but has some associated difficulties. We assess changes in hospital outlier status based on standard logistic versus hierarchical logistic modelling of mortality. The study population consisted of all patients admitted to acute, non-specialist hospitals in England between 2007 and 2011 with a primary diagnosis of acute myocardial infarction, acute cerebrovascular disease or fracture of neck of femur or a primary procedure of coronary artery bypass graft or repair of abdominal aortic aneurysm. We compared standardised mortality ratios (SMRs) from non-hierarchical models with SMRs from hierarchical models, without and with shrinkage estimates of the predicted probabilities (Model 1 and Model 2). The SMRs from standard logistic and hierarchical models were highly statistically significantly correlated (r > 0.91, p = 0.01). More outliers were recorded in the standard logistic regression than hierarchical modelling only when using shrinkage estimates (Model 2): 21 hospitals (out of a cumulative number of 565 pairs of hospitals under study) changed from a low outlier and 8 hospitals changed from a high outlier based on the logistic regression to a not-an-outlier based on shrinkage estimates. Both standard logistic and hierarchical modelling have identified nearly the same hospitals as mortality outliers. The choice of methodological approach should, however, also consider whether the modelling aim is judgment or improvement, as shrinkage may be more appropriate for the former than the latter.
Full Text Available The aim of this study was to compare multilayer perceptron neural networks (NNs with standard logistic regression (LR to identify key covariates impacting on mortality from cancer causes, disease-free survival (DFS, and disease recurrence using Area Under Receiver-Operating Characteristics (AUROC in breast cancer patients. From 1996 to 2004, 2,535 patients diagnosed with primary breast cancer entered into the study at a single French centre, where they received standard treatment. For specific mortality as well as DFS analysis, the ROC curves were greater with the NN models compared to LR model with better sensitivity and specificity. Four predictive factors were retained by both approaches for mortality: clinical size stage, Scarff Bloom Richardson grade, number of invaded nodes, and progesterone receptor. The results enhanced the relevance of the use of NN models in predictive analysis in oncology, which appeared to be more accurate in prediction in this French breast cancer cohort.
Guanche, Yanira; Mínguez, Roberto; Méndez, Fernando J.
The study of atmospheric patterns, weather types or circulation patterns, is a topic deeply studied by climatologists, and it is widely accepted to disaggregate the atmospheric conditions over regions in a certain number of representative states. This consensus allows simplifying the study of climate conditions to improve weather predictions and a better knowledge of the influence produced by anthropogenic activities on the climate system. Once the atmospheric conditions have been reduced to a catalogue of representative states, it is desirable to dispose of numerical models to improve our understanding about weather dynamics, i.e. i) to analyze climate change studying trends in the probability of occurrence of weather types, ii) to study seasonality and iii) to analyze the possible influence of previous states (Autoregressive terms or Markov Chains). This work introduces the mathematical framework to analyze those effects from a qualitative point of view. In particular, an autoregressive logistic regression model, which has been successfully applied in medical and pharmacological research fields, is presented. The main advantages of autoregressive logistic regression are that i) it can be used to model polytomous outcome variables, such as circulation types, and ii) standard statistical software can be used for fitting purposes. To show the potential of these kind of models for analyzing atmospheric conditions, a case of study located in the Northeastern Atlantic is described. Results obtained show how the model is capable of dealing simultaneously with predictors related to different time scales, which can be used to simulate the behaviour of circulation patterns.
Romo, David Ricardo
Foreign Object Debris/Damage (FOD) has been an issue for military and commercial aircraft manufacturers since the early ages of aviation and aerospace. Currently, aerospace is growing rapidly and the chances of FOD presence are growing as well. One of the principal causes in manufacturing is the human error. The cost associated with human error in commercial and military aircrafts is approximately accountable for 4 billion dollars per year. This problem is currently addressed with prevention programs, elimination techniques, and designation of FOD areas, controlled access, restrictions of personal items entering designated areas, tool accountability, and the use of technology such as Radio Frequency Identification (RFID) tags, etc. All of the efforts mentioned before, have not show a significant occurrence reduction in terms of manufacturing processes. On the contrary, a repetitive path of occurrence is present, and the cost associated has not declined in a significant manner. In order to address the problem, this thesis proposes a new approach using statistical analysis. The effort of this thesis is to create a predictive model using historical categorical data from an aircraft manufacturer only focusing in human error causes. The use of contingency tables, natural logarithm of the odds and probability transformation is used in order to provide the predicted probabilities of each aircraft. A case of study is shown in this thesis in order to show the applied methodology. As a result, this approach is able to predict the possible outcomes of FOD by the workstation/area needed, and monthly predictions per workstation. This thesis is intended to be the starting point of statistical data analysis regarding FOD in human factors. The purpose of this thesis is to identify the areas where human error is the primary cause of FOD occurrence in order to design and implement accurate solutions. The advantages of the proposed methodology can go from the reduction of cost
Lachin, John M
The conditional logistic regression model (Biometrics 1982; 38:661-672) provides a convenient method for the assessment of qualitative or quantitative covariate effects on risk in a study with matched sets, each containing a possibly different number of cases and controls. The conditional logistic likelihood is identical to the stratified Cox proportional hazards model likelihood, with an adjustment for ties (J. R. Stat. Soc. B 1972; 34:187-220). This likelihood also applies to a nested case-control study with multiply matched cases and controls, selected from those at risk at selected event times. Herein the distribution of the score test for the effect of a covariate in the model is used to derive simple equations to describe the power of the test to detect a coefficient theta (log odds ratio or log hazard ratio) or the number of cases (or matched sets) and controls required to provide a desired level of power. Additional expressions are derived for a quantitative covariate as a function of the difference in the assumed mean covariate values among cases and controls and for a qualitative covariate in terms of the difference in the probabilities of exposure for cases and controls. Examples are presented for a nested case-control study and a multiply matched case-control study.
Weiss, Brandi A.; Dardick, William
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
Flantua, Suzette G.A.; Boxel, John H. van; Hooghiemstra, Henry; Smaalen, John van [University of Amsterdam, Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, Amsterdam (Netherlands)
Climate changes affect the abundance, geographic extent, and floral composition of vegetation, which are reflected in the pollen rain. Sediment cores taken from lakes and peat bogs can be analysed for their pollen content. The fossil pollen records provide information on the temporal changes in climate and palaeo-environments. Although the complexity of the variables influencing vegetation distribution requires a multi-dimensional approach, only a few research projects have used GIS to analyse pollen data. This paper presents a new approach to palynological data analysis by combining GIS and spatial modelling. Eastern Colombia was chosen as a study area owing to the migration of the forest-savanna boundary since the last glacial maximum, and the availability of pollen records. Logistic regression has been used to identify the climatic variables that determine the distribution of savanna and forest in eastern Colombia. These variables were used to create a predictive land-cover model, which was subsequently implemented into a GIS to perform spatial analysis on the results. The palynological data from the study area were incorporated into the GIS. Reconstructed maps of past vegetation distribution by interpolation showed a new approach of regional multi-site data synthesis related to climatic parameters. The logistic regression model resulted in a map with 85.7% predictive accuracy, which is considered useful for the reconstruction of future and past land-cover distributions. The suitability of palynological GIS application depends on the number of pollen sites, the distribution of the pollen sites over the area of interest, and the degree of overlap of the age ranges of the pollen records. (orig.)
Passmore, David L.; Mohamed, Dominic A.
Describes the workings of a simple two-way table of employment status by sex and extends this table to include school enrollment status by sex, race, and high school graduation status using logistic regression techniques. (JOW)
Full Text Available Jaroslava WendlováDerer’s University Hospital and Policlinic, Osteological Unit, Bratislava, SlovakiaAbstract: The latest methods in estimating the probability (absolute risk of osteoporotic fractures include several logistic regression models, based on qualitative risk factors plus bone mineral density (BMD, and the probability estimate of fracture in the future. The Slovak logistic regression model, in contrast to other models, is created from quantitative variables of the proximal femur (in International System of Units and estimates the probability of fracture by fall.Objectives: The first objective of this study was to order selected independent variables according to the intensity of their influence (statistical significance upon the occurrence of values of the dependent variable: femur strength index (FSI. The second objective was to determine, using logistic regression, whether the odds of FSI acquiring a pathological value (femoral neck fracture by fall increased or declined if the value of the variables (T–score total hip, BMI, alpha angle, theta angle and HAL were raised by one unit.Patients and methods: Bone densitometer measurements using dual energy X–ray absorptiometry (DXA, (Prodigy, Primo, GE, USA of the left proximal femur were obtained from 3 216 East Slovak women with primary or secondary osteoporosis or osteopenia, aged 20–89 years (mean age 58.9; 95% CI: −58.42; 59.38. The following variables were measured: FSI, T-score total hip BMD, body mass index (BMI, as were the geometrical variables of proximal femur alpha angle (α angle, theta angle (θ angle, and hip axis length (HAL.Statistical analysis: Logistic regression was used to measure the influence of the independent variables (T-score total hip, alpha angle, theta angle, HAL, BMI upon the dependent variable (FSI.Results: The order of independent variables according to the intensity of their influence (greatest to least upon the occurrence of values of the
MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM
Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted
Hernandez, David J; Han, Misop; Humphreys, Elizabeth B; Mangold, Leslie A; Taneja, Samir S; Childs, Stacy J; Bartsch, Georg; Partin, Alan W
To develop a logistic regression-based model to predict prostate cancer biopsy at, and compare its performance to the risk calculator developed by the Prostate Cancer Prevention Trial (PCPT), which was based on age, race, prostate-specific antigen (PSA) level, a digital rectal examination (DRE), family history, and history of a previous negative biopsy, and to PSA level alone. We retrospectively analysed the data of 1280 men who had a biopsy while enrolled in a prospective, multicentre clinical trial. Of these, 1108 had all relevant clinical and pathological data available, and no previous diagnosis of prostate cancer. Using the PCPT risk calculator, we calculated the risks of prostate cancer and of high-grade disease (Gleason score > or =7) for each man. Receiver operating characteristic (ROC) curves for the risk calculator, PSA level and the novel regression-based model were compared. Prostate cancer was detected in 394 (35.6%) men, and 155 (14.0%) had Gleason > or =7 disease. For cancer prediction, the area under the ROC curve (AUC) for the risk calculator was 66.7%, statistically greater than the AUC for PSA level of 61.9% (P calculator and PSA level, respectively (P = 0.024). The AUCs increased to 71.2% (P calculator modestly improves the performance of PSA level alone in predicting an individual's risk of prostate cancer or high-grade disease on biopsy. This predictive tool might be enhanced by including percentage free PSA and the number of biopsy cores.
Iyama, Yuji [Kumamoto Chuo Hospital, Department of Diagnostic Radiology, Kumamoto, Kumamoto (Japan); Kumamoto University, Department of Diagnostic Radiology, Graduate School of Medical Sciences, Kumamoto, Kumamoto (Japan); Nakaura, Takeshi; Nagayama, Yasunori; Utsunomiya, Daisuke; Yamashita, Yasuyuki [Kumamoto University, Department of Diagnostic Radiology, Graduate School of Medical Sciences, Kumamoto, Kumamoto (Japan); Katahira, Kazuhiro; Oda, Seitaro [Kumamoto Chuo Hospital, Department of Diagnostic Radiology, Kumamoto, Kumamoto (Japan); Iyama, Ayumi [National Hospital Organization Kumamoto Medical Center, Department of Diagnostic Radiology, Kumamoto, Kumamoto (Japan)
To develop a prediction model to distinguish between transition zone (TZ) cancers and benign prostatic hyperplasia (BPH) on multi-parametric prostate magnetic resonance imaging (mp-MRI). This retrospective study enrolled 60 patients with either BPH or TZ cancer, who had undergone 3 T-MRI. We generated ten parameters for T2-weighted images (T2WI), diffusion-weighted images (DWI) and dynamic MRI. Using a t-test and multivariate logistic regression (LR) analysis to evaluate the parameters' accuracy, we developed LR models. We calculated the area under the receiver operating characteristic curve (ROC) of LR models by a leave-one-out cross-validation procedure, and the LR model's performance was compared with radiologists' performance with their opinion and with the Prostate Imaging Reporting and Data System (Pi-RADS v2) score. Multivariate LR analysis showed that only standardized T2WI signal and mean apparent diffusion coefficient (ADC) maintained their independent values (P < 0.001). The validation analysis showed that the AUC of the final LR model was comparable to that of board-certified radiologists, and superior to that of Pi-RADS scores. A standardized T2WI and mean ADC were independent factors for distinguishing between BPH and TZ cancer. The performance of the LR model was comparable to that of experienced radiologists. (orig.)
Mumcu Kucuker, Derya; Baskent, Emin Zeki
Integration of non-wood forest products (NWFPs) into forest management planning has become an increasingly important issue in forestry over the last decade. Among NWFPs, mushrooms are valued due to their medicinal, commercial, high nutritional and recreational importance. Commercial mushroom harvesting also provides important income to local dwellers and contributes to the economic value of regional forests. Sustainable management of these products at the regional scale requires information on their locations in diverse forest settings and the ability to predict and map their spatial distributions over the landscape. This study focuses on modeling the spatial distribution of commercially harvested Lactarius deliciosus and L. salmonicolor mushrooms in the Kızılcasu Forest Planning Unit, Turkey. The best models were developed based on topographic, climatic and stand characteristics, separately through logistic regression analysis using SPSS™. The best topographic model provided better classification success (69.3 %) than the best climatic (65.4 %) and stand (65 %) models. However, the overall best model, with 73 % overall classification success, used a mix of several variables. The best models were integrated into an Arc/Info GIS program to create spatial distribution maps of L. deliciosus and L. salmonicolor in the planning area. Our approach may be useful to predict the occurrence and distribution of other NWFPs and provide a valuable tool for designing silvicultural prescriptions and preparing multiple-use forest management plans.
Springer, David B; Tarassenko, Lionel; Clifford, Gari D
The identification of the exact positions of the first and second heart sounds within a phonocardiogram (PCG), or heart sound segmentation, is an essential step in the automatic analysis of heart sound recordings, allowing for the classification of pathological events. While threshold-based segmentation methods have shown modest success, probabilistic models, such as hidden Markov models, have recently been shown to surpass the capabilities of previous methods. Segmentation performance is further improved when a priori information about the expected duration of the states is incorporated into the model, such as in a hidden semi-Markov model (HSMM). This paper addresses the problem of the accurate segmentation of the first and second heart sound within noisy real-world PCG recordings using an HSMM, extended with the use of logistic regression for emission probability estimation. In addition, we implement a modified Viterbi algorithm for decoding the most likely sequence of states, and evaluated this method on a large dataset of 10,172 s of PCG recorded from 112 patients (including 12,181 first and 11,627 second heart sounds). The proposed method achieved an average F1 score of 95.63 ± 0.85%, while the current state of the art achieved 86.28 ± 1.55% when evaluated on unseen test recordings. The greater discrimination between states afforded using logistic regression as opposed to the previous Gaussian distribution-based emission probability estimation as well as the use of an extended Viterbi algorithm allows this method to significantly outperform the current state-of-the-art method based on a two-sided paired t-test.
Full Text Available In this paper factors affecting the types of domestic violence against women was determined by multinomial logistic regression model. In this context we used the data of Research on Domestic Violence against Women in Turkey that was applied by Turkish Statistamp305cal Institute in 2008. In the study the variable of the types of domestic violence against women was used as dependent variable that has four levels. In addition twelve independent variables were used removing irrelevant variables from the data set via chi-square test of independence. After that the maximum likelihood estimates and the odds ratios of the variables of the model were obtained. Besides the validity of the model was tested by likelihood ratio test. At last comparisons were made for three categories depending on the odds ratio according to the selected reference category. In terms of odds ratios the variables of education level of woman and husbands work sector were statistically significant in only comparison one the variables of agnation with husband education level of husband frequency of seeing drunk husband and frequency of gambling of husband were statistically significant in both comparison one and three the variables of region deceived by husband common-law female for husband were statistically significant in all comparisons.
Coffey Christopher S
Full Text Available Abstract Background To examine interactions among the angiotensin converting enzyme (ACE insertion/deletion, plasminogen activator inhibitor-1 (PAI-1 4G/5G, and tissue plasminogen activator (t-PA insertion/deletion gene polymorphisms on risk of myocardial infarction using data from 343 matched case-control pairs from the Physicians Health Study. We examined the data using both conditional logistic regression and the multifactor dimensionality reduction (MDR method. One advantage of the MDR method is that it provides an internal prediction error for validation. We summarize our use of this internal prediction error for model validation. Results The overall results for the two methods were consistent, with both suggesting an interaction between the ACE I/D and PAI-1 4G/5G polymorphisms. However, using ten-fold cross validation, the 46% prediction error for the final MDR model was not significantly lower than that expected by chance. Conclusions The significant interaction initially observed does not validate and may represent a type I error. As data-driven analytic methods continue to be developed and used to examine complex genetic interactions, it will become increasingly important to stress model validation in order to ensure that significant effects represent true relationships rather than chance findings.
Coffey, Christopher S; Hebert, Patricia R; Ritchie, Marylyn D; Krumholz, Harlan M; Gaziano, J Michael; Ridker, Paul M; Brown, Nancy J; Vaughan, Douglas E; Moore, Jason H
To examine interactions among the angiotensin converting enzyme (ACE) insertion/deletion, plasminogen activator inhibitor-1 (PAI-1) 4G/5G, and tissue plasminogen activator (t-PA) insertion/deletion gene polymorphisms on risk of myocardial infarction using data from 343 matched case-control pairs from the Physicians Health Study. We examined the data using both conditional logistic regression and the multifactor dimensionality reduction (MDR) method. One advantage of the MDR method is that it provides an internal prediction error for validation. We summarize our use of this internal prediction error for model validation. The overall results for the two methods were consistent, with both suggesting an interaction between the ACE I/D and PAI-1 4G/5G polymorphisms. However, using ten-fold cross validation, the 46% prediction error for the final MDR model was not significantly lower than that expected by chance. The significant interaction initially observed does not validate and may represent a type I error. As data-driven analytic methods continue to be developed and used to examine complex genetic interactions, it will become increasingly important to stress model validation in order to ensure that significant effects represent true relationships rather than chance findings.
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Huang, Francis L.; Moon, Tonya R.
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Guanche, Y.; Mínguez, R.; Méndez, F. J.
Autoregressive logistic regression models have been successfully applied in medical and pharmacology research fields, and in simple models to analyze weather types. The main purpose of this paper is to introduce a general framework to study atmospheric circulation patterns capable of dealing simultaneously with: seasonality, interannual variability, long-term trends, and autocorrelation of different orders. To show its effectiveness on modeling performance, daily atmospheric circulation patterns identified from observed sea level pressure fields over the Northeastern Atlantic, have been analyzed using this framework. Model predictions are compared with probabilities from the historical database, showing very good fitting diagnostics. In addition, the fitted model is used to simulate the evolution over time of atmospheric circulation patterns using Monte Carlo method. Simulation results are statistically consistent with respect to the historical sequence in terms of (1) probability of occurrence of the different weather types, (2) transition probabilities and (3) persistence. The proposed model constitutes an easy-to-use and powerful tool for a better understanding of the climate system.
Didarloo, Alireza; Nabilou, Bahram; Khalkhali, Hamid Reza
Breast cancer is a life-threatening condition affecting women around the world. The early detection of breast lumps using a breast self-examination (BSE) is important for the prevention and control of this disease. The aim of this study was to examine BSE behavior and its predictive factors among female university students using the Health Belief Model (HBM). This investigation was a cross-sectional survey carried out with 334 female students at Urmia University of Medical Sciences in the northwest of Iran. To collect the necessary data, researchers applied a valid and reliable three-part questionnaire. The data were analyzed using descriptive statistics and a chi-square test, in addition to multivariate logistic regression statistics in SPSS software version 16.0 (SPSS Inc., Chicago, IL, USA). The results indicated that 82 of the 334 participants (24.6%) reported practicing BSEs. Multivariate logistic regression analyses showed that high perceived severity [OR = 2.38, 95% CI = (1.02-5.54)], high perceived benefits [OR = 1.94, 95% CI = (1.09-3.46)], and high perceived self-efficacy [OR = 13.15, 95% CI = (3.64-47.51)] were better predictors of BSE behavior (P < 0.05) than low perceived severity, benefits, and self-efficacy. The findings also showed that a high level of knowledge compared to a low level of knowledge [OR = 5.51, 95% CI = (1.79-16.86)] and academic undergraduate and graduate degrees compared to doctoral degrees [OR = 2.90, 95% CI = (1.42-5.92)] of the participants were predictors of BSE performance (P < 0.05). The study revealed that the HBM constructs are able to predict BSE behavior. Among these constructs, self-efficacy was the most important predictor of the behavior. Interventions based on the constructs of perceived self-efficacy, benefits, and severity are recommended for increasing women's regular screening for breast cancer.
Coffey, Christopher S; Hebert, Patricia R; Ritchie, Marylyn D; Krumholz, Harlan M; Gaziano, J Michael; Ridker, Paul M; Brown, Nancy J; Vaughan, Douglas E; Moore, Jason H
Abstract Background To examine interactions among the angiotensin converting enzyme (ACE) insertion/deletion, plasminogen activator inhibitor-1 (PAI-1) 4G/5G, and tissue plasminogen activator (t-PA) insertion/deletion gene polymorphisms on risk of myocardial infarction using data from 343 matched case-control pairs from the Physicians Health Study. We examined the data using both conditional logistic regression and the multifactor dimensionality reduction (MDR) method. One advantage of the MD...
Workie, Demeke Lakew; Zike, Dereje Tesfaye; Fenta, Haile Mekonnen; Mekonnen, Mulusew Admasu
Unintended pregnancy related to unmet need is a worldwide problem that affects societies. The main objective of this study was to identify the prevalence and determinants of unmet need for family planning among women aged (15-49) in Ethiopia. The Performance Monitoring and Accountability2020/Ethiopia was conducted in April 2016 at round-4 from 7494 women with two-stage-stratified sampling. Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. The prevalence of unmet-need for family planning was 16.2% in Ethiopia. Women between the age range of 15-24 years were 2.266 times more likely to have unmet need family planning compared to above 35 years. Women who were currently married were about 8 times more likely to have unmet need family planning compared to never married women. Women who had no under-five child were 0.125 times less likely to have unmet need family planning compared to those who had more than two-under-5. The key determinants of unmet need family planning in Ethiopia were residence, age, marital-status, education, household members, birth-events and number of under-5 children. Thus the Government of Ethiopia would take immediate steps to address the causes of high unmet need for family planning among women.
Full Text Available Deforestation is one of the major effects posed by the smallholder tobacco farming as the farmers heavily depend on firewood sourced from natural forest for curing tobacco. The research aims at assessing the factors that influence the harvesting of natural forest in the production of tobacco. Data is collected through the structured questionnaire from 60 randomly selected farmers. Binary logistic regression model is used to explain the significance of factors influencing natural forest harvesting. Results show that farmer experience, tobacco selling price, and agricultural training level negatively affect the harvesting of natural forests (to obtain firewood for curing tobacco significantly (p0.05 in influencing natural forest harvesting. Though farmers are exploiting the environment and at the same time increasing foreign currency earning through tobacco production, there is therefore a need to put in place policies that encourage sustainable forest product utilization such as gum plantations, subsidizing price of coal, and introducing fees, as well as penalties or taxes to the offenders.
Morgul, Mehmet Haluk; Klunk, Sergej; Anastasiadou, Zografia; Gauger, Ulrich; Dietel, Corinna; Reutzel-Selke, Anja; Felgendref, Philipp; Hau, Hans-Michael; Tautenhahn, Hans-Michael; Schmuck, Rosa Bianca; Raschzok, Nathanael; Sauer, Igor Maximillian; Bartels, Michael
The presence of hepatocellular carcinoma (HCC) is a significant complication of cirrhosis because it changes the prognosis and the treatment of the patients. By now, contrast-enhanced CT and MR scans are the most reliable tools for the diagnosis of HCC; however, in some cases, a biopsy of the tumor is necessary for the final diagnosis. The aim of the study was to develop a diagnostic tool using the microRNA (miRNA) profiles of the tissue surrounding the HCC tumor combined with clinical parameters in statistical models. At a transplantation setting, 32 patients with HCC and cirrhosis (B) were compared to 22 patients suffering from cirrhosis only (A). The diagnosis and exclusion of HCC was confirmed following the histopathological examination of the explanted liver. The HCC patients were significantly older than the patients with cirrhosis only (B: 60.6 and A: 49.9, pHCC and cirrhosis from those with cirrhosis only with an accuracy of 96.3%. This is the first report about the use of stepwise penalized logistic regression and decision tree analyses of miRNA expressions in the tumor-surrounding tissue combined with clinical parameters for the diagnosis of HCC. Copyright © 2016 Elsevier Inc. All rights reserved.
French, Brian F.; Finch, W. Holmes
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated…
cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene– steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer. (P < 0.05); especially, SNP rs6532496 revealed the strongest association with cancer ...
Uzunova, Yordanka; Prodanova, Krasimira; Spasov, Lyubomir
Using a two-factor logistic regression analysis an estimate is derived for the probability of absence of infections in the early postoperative period after pediatric liver transplantation. The influence of both the bilirubin level and the international normalized ratio of prothrombin time of blood coagulation at the 5th postoperative day is studied.
Zhang, Xiu-mei; Shao, Yu-hong; Xiong, Xia; Wan, Yuan-lian
To create a breast nodule estimation model based on grayscale and color Doppler ultrasonography using Logistic regression that can screen out the specific features for distinguishing breast malignancy from benignancy. From July, 2009 to May, 2010, 217 patients were enrolled in the study in peking university first hospital. Clinical data and ultrasonic features were evaluated in 219 breast nodules of 217 patients confirmed by surgical pathology. Logistic regression model was established to screen out significant ultrasonic indexes for differentiating breast malignancy from benignancy. A receiver operating characteristics curve was made to assess diagnostic value of the Logistic regression model. Correlation was analyzed between the Logistic regression model and surgical pathology. Logistic regression model: Logit(p) = -16.884 + 0.037 × age + 3.228 × longitudinal-transverse axis ratio + 1.412 × border + 2.663 × halo + 1.813 × microcalcium + 1.157 × resistance index + 2.204 × enlarged axillary lymph node (χ(2) = 167.107, P = 000). The areas of ROC curve for probability and identification of breast malignant and benign nodule were 0.948 and 0.882 respectively. Diagnostic sensitivity, specificity and accuracy were 91.6%, 84.9% and 88.9%. Logistic regression model positively correlated with surgical pathology (r = 0.768, P = 0.000). Our Logistic regression model can effectively differentiate malignant breast nodules from benign and can identify the ultrasonic features associated with breast cancer.
Bergtold, Jason S.; Yeager, Elizabeth A.; Featherstone, Allen M.
The logistic regression models has been widely used in the social and natural sciences and results from studies using this model can have significant impact. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this study is to examine the impact of sample size on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model. A numbe...
Dixit, A.; Singh, V. K.
Recent studies conducted by World Health Organisation (WHO) estimated that 92 % of the total world population are living in places where the air quality level has exceeded the WHO standard limit for air quality. This is due to the change in Land Use Land Cover (LULC) pattern, socio economic drivers and anthropogenic heat emission caused by manmade activity. Thereby, many prevalent human respiratory diseases such as lung cancer, chronic obstructive pulmonary disease and emphysema have increased in recent times. In this study, a quantitative relationship is developed between land use (built-up land, water bodies, and vegetation), socio economic drivers and air quality parameters using logistic based regression model over 7 different cities of India for the winter season of 2012 to 2016. Different LULC, socio economic, industrial emission sources, meteorological condition and air quality level from the monitoring stations are taken to estimate the influence on morbidity of each city. Results of correlation are analyzed between land use variables and monthly concentration of pollutants. These values range from 0.63 to 0.76. Similarly, the correlation value between land use variable with socio economic and morbidity ranges from 0.57 to 0.73. The performance of model is improved from 67 % to 79 % in estimating morbidity for the year 2015 and 2016 due to the better availability of observed data.The study highlights the growing importance of incorporating socio-economic drivers with air quality data for evaluating morbidity rate for each city in comparison to just change in quantitative analysis of air quality.
Duda, David P.; Minnis, Patrick
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
Full Text Available Abstract Background In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status. Results Within each sliding window, we apply a forward model selection procedure using generalized ridge logistic regression for model fitness in each step. In power simulations, we compare the performance of our method with that of five other methods in current use. Averaging power over all the conditions considered, our method dominates the others. We also present two published datasets where our method is useful in causal SNP identification. Conclusions Our method can automatically combine genetic information in local genomic regions and allow for linkage disequilibrium between SNPs. It can overcome some defects of the scan statistics approach and will be very promising in genome-wide case-control association studies.
Liu, Zhe; Shen, Yuanyuan; Ott, Jurg
In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs) one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status. Within each sliding window, we apply a forward model selection procedure using generalized ridge logistic regression for model fitness in each step. In power simulations, we compare the performance of our method with that of five other methods in current use. Averaging power over all the conditions considered, our method dominates the others. We also present two published datasets where our method is useful in causal SNP identification. Our method can automatically combine genetic information in local genomic regions and allow for linkage disequilibrium between SNPs. It can overcome some defects of the scan statistics approach and will be very promising in genome-wide case-control association studies.
Diomede, Tommaso; Marsigli, Chiara; Stefania Tesini, Maria
A method based on logistic regression is proposed for the prediction of river level threshold exceedance at different lead times (from +6h up to +42h). The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation ("state of the catchment"), (ii) the last 24 hours, which may be relevant for the current water level in the river ("state of the river"), and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs ("state of the atmosphere"). Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the current "state of the river" can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18-42 hours, or the previous 6-12h, according to the forecast lead time. The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over several catchments in the Emilia-Romagna Region, northern Italy, which dimensions range from 100 to 1000 km2. A statistical analysis in terms of false alarms, misses and related scores was carried out by using
Full Text Available The aim of this paper is estimation of Binary logistic regression parameters for maximizing the log-likelihood function with improved association indicators. In this paper the parameter estimation steps have been explained and then measures of association have been introduced and their calculations have been analyzed. Moreover a new related indicators based on membership degree level have been expressed. Indeed association measures demonstrate the number of success responses occurred in front of failure in certain number of Bernoulli independent experiments. In parameter estimation, existing indicators values is not sensitive to the parameter values, whereas the proposed indicators are sensitive to the estimated parameters during the iterative procedure. Therefore, proposing a new association indicator of binary logistic regression with more sensitivity to the estimated parameters in maximizing the log- likelihood in iterative procedure is innovation of this study.
Albert, Paul S; Liu, Aiyi; Nansel, Tonja
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial. © 2013, The International Biometric Society.
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric
Kiezun, Adam; Lee, I-Ting Angelina; Shomron, Noam
Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.
Colorectal cancer is common and lethal disease with different incidence rate in different parts of the world which is taken into account as the third cause of cancer-related deaths. In the present study, using non-parametric Cox model and parametric Log-logistic model, factors influencing survival of patients with colorectal ...
Davidson, J. Cody
Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Jul 4, 2011 ... Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of ... Key words: Support vector regression, genetic algorithm, logistic model, prediction of biomass. .... method for solving non-linear regression problem, which depends on the ...
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Sidi, P.; Mamat, M.; Sukono; Supian, S.
Floods have always occurred in the Citarum river basin. The adverse effects caused by floods can cover all their property, including the destruction of houses. The impact due to damage to residential buildings is usually not small. Indeed, each of flooding, the government and several social organizations providing funds to repair the building. But the donations are given very limited, so it cannot cover the entire cost of repair was necessary. The presence of insurance products for property damage caused by the floods is considered very important. However, if its presence is also considered necessary by the public or not? In this paper, the factors that affect the supply and demand of insurance product for damaged building due to floods are analyzed. The method used in this analysis is the ordinal logistic regression. Based on the analysis that the factors that affect the supply and demand of insurance product for damaged building due to floods, it is included: age, economic circumstances, family situations, insurance motivations, and lifestyle. Simultaneously that the factors affecting supply and demand of insurance product for damaged building due to floods mounted to 65.7%.
Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.
Tapan Kumar Bhowmik
Full Text Available This article presents the theoretical derivation as well as practical steps for implementing Naive Bayes (NB and Logistic Regression (LR classifiers. A generative learning under Gaussian Naive Bayes assumption and two discriminative learning techniques based on gradient ascent and Newton-Raphson methods are described to estimate the parameters of LR. Some limitation of learning techniques and implementation issues are discussed as well. A set of experiments are performed for both the classifiers under different learning circumstances and their performances are compared. From the experiments, it is observed that LR learning with gradient ascent technique outperforms general NB classifier. However, under Gaussian Naive Bayes assumption, both classifiers NB and LR perform similar.
Factors associated with long-stay nursing home admissions among the U.S. elderly population: comparison of logistic regression and the Cox proportional hazards model with policy implications for social work.
Cai, Qian; Salmon, J Warren; Rodgers, Mark E
Two statistical methods were compared to identify key factors associated with long-stay nursing home (LSNH) admission among the U.S. elderly population. Social Work's interest in services to the elderly makes this research critical to the profession. Effectively transitioning the "baby boomer" population into appropriate long-term care will be a great societal challenge. It remains a challenge paramount to the practice of social work. Secondary data analyses using four waves (1995, 1998, 2000, and 2002) of the Health Retirement Study (HRS) coupled with the Assets and Health Dynamics among the Oldest Old (AHEAD) surveys were conducted. Multivariable logistic regression and Cox proportional hazards model were performed and compared. Older age, lower self-perceived health, worse instrumental activities of daily living (IADL), psychiatric problems, and living alone were found significantly associated with increased risk of LSNH admission. In contrast, being female, African American, or Hispanic; owning a home; and having lower level of cognitive impairment reduced the admission risk. Home ownership showed a significant effect in logistic regression, but a marginal effect in the Cox model. The Cox model generally provided more precise parameter estimates than logistic regression. Logistic regression, used frequently in analyses, can provide a good approximation to the Cox model in identifying factors of LSNH admission. However, the Cox model gives more information on how soon the LSNH admission may happen. Our analyses, based on two models, dually identified the factors associated with LSNH admission; therefore, results discussed confidently provide implications for both public and private long-term care policies, as well as improving the assessment capabilities of social work practitioners for development of screening programs among at-risk elderly. Given the predicted surge in this population, significant factors found from this study can be utilized in a strengths
Johnson, Kelly S; Rankin, Ed; Bowman, Jen; Deeds, Jessica; Kruse, Natalie
Mayflies (Order Ephemeroptera) require high quality water and habitat in streams to thrive, so their appearance after restoration is an indicator of ecological recovery. To better understand the importance of restoring in-stream habitat versus water chemistry for macroinvertebrate communities, we developed taxon-specific models of occurrence for five mayfly genera (Caenis, Isonychia, Stenonema, Stenacron, and Baetis) inhabiting streams in the Appalachian Mountains, USA. Presence/absence records from past decades were used to develop single and multiple logistic predictive models based on catchment characteristics (drainage area, gradient), in-stream habitat variables (e.g., substrate, channel morphology, pool and riffle quality), and water chemistry. Model performance was evaluated using (a) classification rates and Hosmer-Lemeshow values for test sets of data withheld from the original model-building dataset and (b) a field comparison of predicted versus observed mayfly occurrences at 53 sites in acid mine drainage-impaired watersheds in 2012. The classification accuracies of final models for Caenis, Stenacron, and Baetis ranged from 50 to 75%. In-stream habitat features were not significant predictor variables for these three taxa, only water chemistry. Models for Isonychia and Stenonema had higher classification rates (81%) and included both habitat and chemical variables. However, actual occurrences of Isonychia and Stenonema at study sites in 2012 were low, consistent with the calculated probability of occurrence (P o ) 0.40. Stenacron showed the greatest consistency of actual versus predicted occurrences, occurring at 56% of sites when the P o (based on pH and conductivity) was > 0.50 and only at 1 site when P o water chemistry during stream remediation.
Wen Jinai; Yuan Liyun; Jiang Ruyi
Logistic regression model has widely been used in the field of medicine. The computer software on this model is popular, but it is worth to discuss how to use this model correctly. Using SPSS (Statistical Package for the Social Science) software, unconditional logistic regression method was adopted to carry out multi-factor analyses on the cause of total death, cancer death and lung cancer death of uranium miners. The data is from radioepidemiological database of one uranium mine. The result show that attained age is a risk factor in the logistic regression analyses of total death, cancer death and lung cancer death. In the logistic regression analysis of cancer death, there is a negative correlation between the age of exposure and cancer death. This shows that the younger the age at exposure, the bigger the risk of cancer death. In the logistic regression analysis of lung cancer death, there is a positive correlation between the cumulated exposure and lung cancer death, this show that cumulated exposure is a most important risk factor of lung cancer death on uranium miners. It has been documented by many foreign reports that the lung cancer death rate is higher in uranium miners
Elosua, Paula; Wells, Craig
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Yang, Lixue; Chen, Kean
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Huang, Anhui; Xu, Shizhong; Cai, Xiaodong
Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs. In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request. The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Zeng, Yaohui; Breheny, Patrick
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.
Keating, Kim A.; Cherry, Steve
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Ysebaert, T.; Meire, P.; Herman, P.M.J.; Verbeek, H.
This study aims at contributing to the development of statistical models to predict macrobenthic species response to environmental conditions in estuarine ecosystems. Ecological response surfaces are derived for 10 estuarine macrobenthic species. Logistic regression is applied on a large data set,
Jensen, Signe Marie; Hauger, Hanne; Ritz, Christian
method. For a study evaluating a treatment effect on visual acuity, a binary outcome, we demonstrate how mediation analysis may conveniently be carried out by means of marginally fitted logistic regression models in combination with the delta method. Several metrics of mediation are estimated and results...
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Maarten van Smeden
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Full Text Available Abstract Background The study attempts to develop an ordinal logistic regression (OLR model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score child nutrition status is categorized into three groups-severely undernourished ( Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM are appropriate to find predictors of malnutrition instead of BLR models.
Arana, E.; Marti-Bonmati, L.; Bautista, D.; Paredes, R.
To study the utility of logistic regression and the neuronal network in the diagnosis of cranial hemangiomas. Fifteen patients presenting hemangiomas were selected form a total of 167 patients with cranial lesions. All were evaluated by plain radiography and computed tomography (CT). Nineteen variables in their medical records were reviewed. Logistic regression and neuronal network models were constructed and validated by the jackknife (leave-one-out) approach. The yields of the two models were compared by means of ROC curves, using the area under the curve as parameter. Seven men and 8 women presented hemangiomas. The mean age of these patients was 38.4 (15.4 years (mea ± standard deviation). Logistic regression identified as significant variables the shape, soft tissue mass and periosteal reaction. The neuronal network lent more importance to the existence of ossified matrix, ruptured cortical vein and the mixed calcified-blastic (trabeculated) pattern. The neuronal network showed a greater yield than logistic regression (Az, 0.9409) (0.004 versus 0.7211± 0.075; p<0.001). The neuronal network discloses hidden interactions among the variables, providing a higher yield in the characterization of cranial hemangiomas and constituting a medical diagnostic acid. (Author)29 refs
Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.
Lin, Le; Lin, Qigen; Wang, Ying
This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC) curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC) was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (Plogistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
Tu, J V
Artificial neural networks are algorithms that can be used to perform nonlinear statistical modeling and provide a new alternative to logistic regression, the most commonly used method for developing predictive models for dichotomous outcomes in medicine. Neural networks offer a number of advantages, including requiring less formal statistical training, ability to implicitly detect complex nonlinear relationships between dependent and independent variables, ability to detect all possible interactions between predictor variables, and the availability of multiple training algorithms. Disadvantages include its "black box" nature, greater computational burden, proneness to overfitting, and the empirical nature of model development. An overview of the features of neural networks and logistic regression is presented, and the advantages and disadvantages of using this modeling technique are discussed.
Lin, L.; Lin, Q.; Wang, Y.
This paper proposes a quantitative model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literatures, five factors were selected to model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building model, 70% of landslide and non-landslide points were randomly selected for logistic regression, and the others were used for model validation. For evaluating the accuracy of predictive models, this paper adopts several criteria including receiver operating characteristic (ROC) curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80% and the area under the curve (AUC) was nearly 0.87. During the validation process, the above statistics were about 81% and 0.88, respectively. Such result indicates that the model has strong robustness and stable performance. Existing studies of global landslide susceptibility mapping have generally used qualitative methods based on expert knowledge. The accumulation of global landslide data makes it practical to mapping global landslide susceptibility quantitatively. This quantitative assessment found that at a global scale, soil moisture dominates the occurrence of landslides and topographic factor is secondary.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Agga, Getahun E; Scott, H Morgan
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.
Henrard, S; Speybroeck, N; Hermans, C
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Long, Rebecca G
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Brabec, Marek; Badescu, V.; Paulescu, M.
Roč. 120, č. 1-2 (2013), s. 61-71 ISSN 0177-7971 R&D Projects: GA MŠk LD12009 Grant - others:European Cooperation in Science and Technology(XE) COST ES1002 Institutional research plan: CEZ:AV0Z1030915 Keywords : logistic regression * Markov model * sunshine number Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.245, year: 2013
Brabec, Marek; Badescu, V.; Paulescu, M.
Roč. 41, č. 6 (2014), s. 1174-1188 ISSN 0266-4763 R&D Projects: GA MŠk LD12009 Grant - others:European Cooperation in Science and Technology(XE) COST ES1002 Institutional support: RVO:67985807 Keywords : clouds * random process * sunshine number * Markovian logistic regression model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.417, year: 2014
Faggion, Clovis Mariano; Chambrone, Leandro; Tu, Yu-Kang
To evaluate the quality of reporting of logistic regression models used to assess risk factors for tooth loss in patients who have received periodontal treatment. The PubMed, EMBASE, BIOSIS Citation Index, CINAHL, Web of Science, and LILACS electronic databases were searched up to 01 March 2014 to identify interventional longitudinal studies assessing risk factors for tooth loss after periodontal treatment. The reference lists of included studies were searched manually. No language restriction was applied to the search. Quality of reporting of logistic regression models was assessed using analytical and documentation criteria with a 15-item checklist. Criteria were judged as met (adequately reported) or not met (not reported). All searches, selection, data extraction, and quality assessment were performed independently and in duplicate. Of 621 records initially retrieved, 24 articles were included in the analysis. Less than 30% of all 360 datapoints were met. "Coding of independent variables" was reported most frequently [n = 22 (83%) articles]. Criteria such as "internal and external validation of the model" were not met in any study assessed. The reporting of logistic regression models in studies assessing risk factors for tooth loss in patients who have received periodontal treatment is not optimal. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Castro, Wagner Barbosa de Mello
Logistic regression models were generated starting from lithofacies described in cores and in well logs for two wells of Campos Basin. The main objective was verify the applicability of the technique in reservoir geology. The models were used to estimate the occurrence of reservoir facies in the wells. Results obtained were compared to the results of a previous discriminant analysis with the objective of determinate the accuracy of the two techniques as tools to estimate reservoir facies. Although discriminant analysis resulted more accurate in the estimate of reservoir facies, the use of logistic regression should not be discarded. Its independence of the normal distribution hypothesis make this technique, at least in theory, more robust than the discriminant analysis. (author)
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Borin-Crivellenti, Sofia; Garabed, Rebecca B; Moreno-Torres, Karla I; Wellman, Maxey L; Gilor, Chen
OBJECTIVE To assess the discriminatory value for corticosteroid-induced alkaline phosphatase (CiALP) activity and other variables that can be measured routinely on a CBC and biochemical analysis for the diagnosis of hypoadrenocorticism in dogs. SAMPLE Medical records of 57 dogs with confirmed hypoadrenocorticism and 57 control dogs in which hypoadrenocorticism was suspected but ruled out. PROCEDURES A retrospective case-control study was conducted. Dogs were included if a CBC and complete biochemical analysis had been performed. Dogs with iatrogenic hypoadrenocorticism and dogs treated previously with glucocorticoids were excluded. Cortisol concentration for dogs with hypoadrenocorticism was ≤ 2 μg/dL both before and after ACTH administration. Cortisol concentration for control dogs was > 4 μg/dL before or after ACTH administration. RESULTS Area under the receiver operating characteristic (ROC) curve for CiALP activity was low (0.646; 95% confidence interval, 0.494 to 0.798). Area under the ROC curve for a model that combined the CiALP activity, Na-to-K ratio, eosinophil count, activity of creatine kinase, and concentrations of SUN and albumin was high (0.994; 95% confidence interval, 0.982 to 1.000). Results for this model could be used to correctly classify all dogs, except for 1 dog with hypoadrenocorticism and no electrolyte abnormalities. CONCLUSIONS AND CLINICAL RELEVANCE CiALP activity alone cannot be used as a reliable diagnostic test for hypoadrenocorticism in dogs. Combined results for CiALP activity, Na-to-K ratio, eosinophil count, creatine kinase activity, and concentrations of SUN and albumin provided an excellent means to discriminate between hypoadrenocorticism and diseases that mimic hypoadrenocorticism.
Saputro, Dewi Retno Sari; Widyaningsih, Purnami
In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).
Ebrahim Karimi Sangchini
Full Text Available Landslides are amongst the most damaging natural hazards in mountainous regions. Every year, hundreds of people all over the world lose their lives in landslides; furthermore, there are large impacts on the local and global economy from these events. In this study, landslide hazard zonation in Babaheydar watershed using logistic regression was conducted to determine landslide hazard areas. At first, the landslide inventory map was prepared using aerial photograph interpretations and field surveys. The next step, ten landslide conditioning factors such as altitude, slope percentage, slope aspect, lithology, distance from faults, rivers, settlement and roads, land use, and precipitation were chosen as effective factors on landsliding in the study area. Subsequently, landslide susceptibility map was constructed using the logistic regression model in Geographic Information System (GIS. The ROC and Pseudo-R2 indexes were used for model assessment. Results showed that the logistic regression model provided slightly high prediction accuracy of landslide susceptibility maps in the Babaheydar Watershed with ROC equal to 0.876. Furthermore, the results revealed that about 44% of the watershed areas were located in high and very high hazard classes. The resultant landslide susceptibility maps can be useful in appropriate watershed management practices and for sustainable development in the region.
Abu-Hanna, Ameen; de Keizer, Nicolette
Health care effectiveness and efficiency are under constant scrutiny especially when treatment is quite costly as in the Intensive Care (IC). Currently there are various international quality of care programs for the evaluation of IC. At the heart of such quality of care programs lie prognostic models whose prediction of patient mortality can be used as a norm to which actual mortality is compared. The current generation of prognostic models in IC are statistical parametric models based on logistic regression. Given a description of a patient at admission, these models predict the probability of his or her survival. Typically, this patient description relies on an aggregate variable, called a score, that quantifies the severity of illness of the patient. The use of a parametric model and an aggregate score form adequate means to develop models when data is relatively scarce but it introduces the risk of bias. This paper motivates and suggests a method for studying and improving the performance behavior of current state-of-the-art IC prognostic models. Our method is based on machine learning and statistical ideas and relies on exploiting information that underlies a score variable. In particular, this underlying information is used to construct a classification tree whose nodes denote patient sub-populations. For these sub-populations, local models, most notably logistic regression ones, are developed using only the total score variable. We compare the performance of this hybrid model to that of a traditional global logistic regression model. We show that the hybrid model not only provides more insight into the data but also has a better performance. We pay special attention to the precision aspect of model performance and argue why precision is more important than discrimination ability.
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay
Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria
We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al.  and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic dro...
Sekiya, Masashi; Tsuji, Toshiaki
This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.
.... Previous research has demonstrated the use of a two-step logistic and multiple regression methodology to predicting cost growth produces desirable results versus traditional single-step regression...
Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold for binary classification.
Jensen, Signe M; Hauger, Hanne; Ritz, Christian
Mediation analysis is often based on fitting two models, one including and another excluding a potential mediator, and subsequently quantify the mediated effects by combining parameter estimates from these two models. Standard errors of such derived parameters may be approximated using the delta method. For a study evaluating a treatment effect on visual acuity, a binary outcome, we demonstrate how mediation analysis may conveniently be carried out by means of marginally fitted logistic regression models in combination with the delta method. Several metrics of mediation are estimated and results are compared to findings using existing methods.
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
John Hogland; Nedret Billor; Nathaniel Anderson
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Larsen, Klaus; Merlo, Juan
The logistic regression model is frequently used in epidemiologic studies, yielding odds ratio or relative risk interpretations. Inspired by the theory of linear normal models, the logistic regression model has been extended to allow for correlated responses by introducing random effects. However......, the model does not inherit the interpretational features of the normal model. In this paper, the authors argue that the existing measures are unsatisfactory (and some of them are even improper) when quantifying results from multilevel logistic regression analyses. The authors suggest a measure...... regression. For this purpose, the authors propose an odds ratio measure, the interval odds ratio, that takes these difficulties into account. The authors demonstrate the two measures by investigating heterogeneity between neighborhoods and effects of neighborhood-level covariates in two examples...
Ali, Faraz Mahmood; Kay, Richard; Finlay, Andrew Y; Piguet, Vincent; Kupfer, Joerg; Dalgard, Florence; Salek, M Sam
The Dermatology Life Quality Index (DLQI) and the European Quality of Life-5 Dimension (EQ-5D) are separate measures that may be used to gather health-related quality of life (HRQoL) information from patients. The EQ-5D is a generic measure from which health utility estimates can be derived, whereas the DLQI is a specialty-specific measure to assess HRQoL. To reduce the burden of multiple measures being administered and to enable a more disease-specific calculation of health utility estimates, we explored an established mathematical technique known as ordinal logistic regression (OLR) to develop an appropriate model to map DLQI data to EQ-5D-based health utility estimates. Retrospective data from 4010 patients were randomly divided five times into two groups for the derivation and testing of the mapping model. Split-half cross-validation was utilized resulting in a total of ten ordinal logistic regression models for each of the five EQ-5D dimensions against age, sex, and all ten items of the DLQI. Using Monte Carlo simulation, predicted health utility estimates were derived and compared against those observed. This method was repeated for both OLR and a previously tested mapping methodology based on linear regression. The model was shown to be highly predictive and its repeated fitting demonstrated a stable model using OLR as well as linear regression. The mean differences between OLR-predicted health utility estimates and observed health utility estimates ranged from 0.0024 to 0.0239 across the ten modeling exercises, with an average overall difference of 0.0120 (a 1.6% underestimate, not of clinical importance). This modeling framework developed in this study will enable researchers to calculate EQ-5D health utility estimates from a specialty-specific study population, reducing patient and economic burden.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Kerns, Lucy X
In risk assessment, it is often desired to make inferences on the low dose levels at which a specific benchmark risk is attained. Applications of simultaneous hyperbolic confidence bands for low-dose risk estimation with quantal data under different dose-response models (multistage, Abbott-adjusted Weibull, and Abbott-adjusted log-logistic models) have appeared in the literature. The use of simultaneous three-segment bands under the multistage model has also been proposed recently. In this article, we present explicit formulas for constructing asymptotic one-sided simultaneous hyperbolic and three-segment bands for the simple log-logistic regression model. We use the simultaneous construction to estimate upper hyperbolic and three-segment confidence bands on extra risk and to obtain lower limits on the benchmark dose by inverting the upper bands on risk under the Abbott-adjusted log-logistic model. Monte Carlo simulations evaluate the characteristics of the simultaneous limits. An example is given to illustrate the use of the proposed methods and to compare the two types of simultaneous limits at very low dose levels. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gordon, Sheldon P.
The solutions of the discrete logistic growth model based on a difference equation and the continuous logistic growth model based on a differential equation are compared and contrasted. The investigation is conducted using a dynamic interactive spreadsheet. (Contains 5 figures.)
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Logistic and linear regression model documentation for statistical relations between continuous real-time and discrete water-quality constituents in the Kansas River, Kansas, July 2012 through June 2015
Foster, Guy M.; Graham, Jennifer L.
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
Zhu, Yanke; Fang, Jiqian
The classification tree is a valuable methodology for predictive modeling and data mining. However, the current existing classification trees ignore the fact that there might be a subset of individuals who cannot be well classified based on the information of the given set of predictor variables and who might be classified with a higher error rate; most of the current existing classification trees do not use the combination of variables in each step. An algorithm of a logistic regression-based trichotomous classification tree (LRTCT) is proposed that employs the trichotomous tree structure and the linear combination of predictor variables in the recursive partitioning process. Compared with the widely used classification and regression tree through the applications on a series of simulated data and 2 real data sets, the LRTCT performed better in several aspects and does not require excessive complicated calculations. © The Author(s) 2016.
Tripepi, Giovanni; Jager, Kitty J.; Stel, Vianda S.; Dekker, Friedo W.; Zoccali, Carmine
Because of some limitations of stratification methods, epidemiologists frequently use multiple linear and logistic regression analyses to address specific epidemiological questions. If the dependent variable is a continuous one (for example, systolic pressure and serum creatinine), the researcher
Pfleiderer, Elaine M; Scroggins, Cheryl L; Manning, Carol A
Two separate logistic regression analyses were conducted for low- and high-altitude sectors to determine whether a set of dynamic sector characteristics variables could reliably discriminate between operational error (OE...
Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun
Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.
Franco Monsreal, José; Tun Cobos, Miriam Del Ruby; Hernández Gómez, José Ricardo; Serralta Peraza, Lidia Esther Del Socorro
interval: 0.43 to 29.87); number of deliveries = 1 (3.86, 95% confidence interval: 0.33 to 44.85); personal pathological history (4.78, 95% confidence interval: 2.16 to 10.59); pathological obstetric history (5.01, 95% confidence interval: 1.66 to 15.18); maternal height confidence interval: 3.08 to 8.65); number of births ≥ 5 (5.99, 95% confidence interval: 0.51 to 69.99); and smoking (15.63, 95% confidence interval: 1.07 to 227.97). Four of the independent variables (personal pathological history, obstetric pathological history, maternal stature birth weight. The use of the logistic regression model in the Mayan municipality of José María Morelos, will allow estimating the probability of low birth weight for each pregnant woman in the future, which will be useful for the health authorities of the region.
Almquist, Zack W.; Butts, Carter T.
Methods for analysis of network dynamics have seen great progress in the past decade. This article shows how Dynamic Network Logistic Regression techniques (a special case of the Temporal Exponential Random Graph Models) can be used to implement decision theoretic models for network dynamics in a panel data context. We also provide practical heuristics for model building and assessment. We illustrate the power of these techniques by applying them to a dynamic blog network sampled during the 2...
Chang, C. H.; Chan, H. C.; Chen, B. A.
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.
We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or
Full Text Available The main goal of this work was to identify the areas that are most susceptible to desertification in a part of the Algerian steppe, and to quantitatively assess the key factors that contribute to this desertification. In total, 139 desertified zones were mapped using field surveys and photo-interpretation. We selected 16 spectral and geomorphic predictive factors, which a priori play a significant role in desertification. They were mainly derived from Landsat 8 imagery and Shuttle Radar Topographic Mission digital elevation model (SRTM DEM. Some factors, such as the topographic position index (TPI and curvature, were used for the first time in this kind of study. For this purpose, we adapted the logistic regression algorithm for desertification susceptibility mapping, which has been widely used for landslide susceptibility mapping. The logistic model was evaluated using the area under the receiver operating characteristic (ROC curve. The model accuracy was 87.8%. We estimated the model uncertainties using a bootstrap method. Our analysis suggests that the predictive model is robust and stable. Our results indicate that land cover factors, including normalized difference vegetation index (NDVI and rangeland classes, play a major role in determining desertification occurrence, while geomorphological factors have a limited impact. The predictive map shows that 44.57% of the area is classified as highly to very highly susceptible to desertification. The developed approach can be used to assess desertification in areas with similar characteristics and to guide possible actions to combat desertification.
Lin, Ji; Lyles, Robert H
We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.
M. Fleischmann (Moritz)
markdownabstractEconomic, marketing, and legislative considerations are increasingly leading companies to take back and recover their products after use. From a logistics perspective, these initiatives give rise to new goods flows from the user back to the producer. The management of these goods
Zhang, Chuncheng; Yao, Li; Song, Sutao; Wen, Xiaotong; Zhao, Xiaojie; Long, Zhiying
Multivariate pattern analysis (MVPA) methods have been widely applied to functional magnetic resonance imaging (fMRI) data to decode brain states. Due to the "high features, low samples" in fMRI data, machine learning methods have been widely regularized using various regularizations to avoid overfitting. Both total variation (TV) using the gradients of images and Euler's elastica (EE) using the gradient and the curvature of images are the two popular regulations with spatial structures. In contrast to TV, EE regulation is able to overcome the disadvantage of TV regulation that favored piecewise constant images over piecewise smooth images. In this study, we introduced EE to fMRI-based decoding for the first time and proposed the EE regularized multinomial logistic regression (EELR) algorithm for multi-class classification. We performed experimental tests on both simulated and real fMRI data to investigate the feasibility and robustness of EELR. The performance of EELR was compared with sparse logistic regression (SLR) and TV regularized LR (TVLR). The results showed that EELR was more robustness to noises and showed significantly higher classification performance than TVLR and SLR. Moreover, the forward models and weights patterns revealed that EELR detected larger brain regions that were discriminative to each task and activated by each task than TVLR. The results suggest that EELR not only performs well in brain decoding but also reveals meaningful discriminative and activation patterns. This study demonstrated that EELR showed promising potential in brain decoding and discriminative/activation pattern detection.
Tse, Samson; Davidson, Larry; Chung, Ka-Fai; Yu, Chong Ho; Ng, King Lam; Tsoi, Emily
More mental health services are adopting the recovery paradigm. This study adds to prior research by (a) using measures of stages of recovery and elements of recovery that were designed and validated in a non-Western, Chinese culture and (b) testing which demographic factors predict advanced recovery and whether placing importance on certain elements predicts advanced recovery. We examined recovery and factors associated with recovery among 75 Hong Kong adults who were diagnosed with schizophrenia and assessed to be in clinical remission. Data were collected on socio-demographic factors, recovery stages and elements associated with recovery. Logistic regression analysis was used to identify variables that could best predict stages of recovery. Receiver operating characteristic curves were used to detect the classification accuracy of the model (i.e. rates of correct classification of stages of recovery). Logistic regression results indicated that stages of recovery could be distinguished with reasonable accuracy for Stage 3 ('living with disability', classification accuracy = 75.45%) and Stage 4 ('living beyond disability', classification accuracy = 75.50%). However, there was no sufficient information to predict Combined Stages 1 and 2 ('overwhelmed by disability' and 'struggling with disability'). It was found that having a meaningful role and age were the most important differentiators of recovery stage. Preliminary findings suggest that adopting salient life roles personally is important to recovery and that this component should be incorporated into mental health services. © The Author(s) 2014.
Hill, Benjamin David; Womble, Melissa N; Rohling, Martin L
This study utilized logistic regression to determine whether performance patterns on Concussion Vital Signs (CVS) could differentiate known groups with either genuine or feigned performance. For the embedded measure development group (n = 174), clinical patients and undergraduate students categorized as feigning obtained significantly lower scores on the overall test battery mean for the CVS, Shipley-2 composite score, and California Verbal Learning Test-Second Edition subtests than did genuinely performing individuals. The final full model of 3 predictor variables (Verbal Memory immediate hits, Verbal Memory immediate correct passes, and Stroop Test complex reaction time correct) was significant and correctly classified individuals in their known group 83% of the time (sensitivity = .65; specificity = .97) in a mixed sample of young-adult clinical cases and simulators. The CVS logistic regression function was applied to a separate undergraduate college group (n = 378) that was asked to perform genuinely and identified 5% as having possibly feigned performance indicating a low false-positive rate. The failure rate was 11% and 16% at baseline cognitive testing in samples of high school and college athletes, respectively. These findings have particular relevance given the increasing use of computerized test batteries for baseline cognitive testing and return-to-play decisions after concussion.
Full Text Available The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO penalty, by using receiver operating characteristic (ROC with bayesian bootstrap estimating area under the curve (AUC diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis.
Erol, Fatih S; Uysal, Hadi; Ergün, Uçman; Barişçi, Necaattin; Serhathoğlu, Selami; Hardalaç, Firat
In this study it is aimed to assess the posttraumatic cerebral hemodynamia in minor head injured patients. Eighty patients with minor head injury (Group 1) evaluated in the early 8 h of posttraumatic period between July 2003 and February 2004. The control group (Group 2) has composed of 32 healthy people. Bilateral blood flow velocities of middle cerebral arteries (MCA) had measured using transtemporal technique while internal carotid arteries were evaluated by submandibular examination. Two different mathematical models such as the traditional statistical method on the basis of logistic regression and a multi-layer perceptron (MLP) neural network are used to classify the age, sex, velocitiy parameters of MCA, mean velocity of extracranial ICAs and V(MCA)/ V(ICA) ratios. The neural network was trained, cross-validated and tested with subject's transcranial Doppler signals. As a result of these classifications, we found the success rate of logistic regression, the success rate of MLP neural network is 88.2 and 89.1%, respectively. The classification results show that MLP neural network is offering the best results in the case of diagnosis.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Agricultural chemicals (herbicides, insecticides, other pesticides and fertilizers) in surface water may constitute a human health risk. Recent research on unregulated rivers in the midwestern USA documents that elevated concentrations of herbicides occur for 1-4 months following application in spring and early summer. In contrast, nitrate concentrations in unregulated rivers are elevated during the fall, winter and spring. Natural and anthropogenic variables of river drainage basins, such as soil permeability, the amount of agricultural chemicals applied or percentage of land planted in corn, affect agricultural chemical concentrations in rivers. Logistic regression (LGR) models are used to investigate relations between various drainage basin variables and the concentration of selected agricultural chemicals in rivers. The method is successful in contributing to the understanding of agricultural chemical concentration in rivers. Overall accuracies of the best LGR models, defined as the number of correct classifications divided by the number of attempted classifications, averaged about 66%.
Hayes, Andrew F; Matthes, Jörg
Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction between the independent and moderator variables in a model of the outcome variable. When an interaction is found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a specific pattern of effects of the focal independent variable as a function of the moderator. This article describes the familiar pick-a-point approach and the much less familiar Johnson-Neyman technique for probing interactions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is also available for users who prefer a point-and-click user interface rather than command syntax.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
Land Walker H
Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Heine, John J; Land, Walker H; Egan, Kathleen M
When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
K. Sikorska (Karolina); E.M.E.H. Lesaffre (Emmanuel); P.J.F. Groenen (Patrick); P.H.C. Eilers (Paul)
textabstractBackground: Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy
Carolyn B. Meyer; Sherri L. Miller; C. John Ralph
The scale at which habitat variables are measured affects the accuracy of resource selection functions in predicting animal use of sites. We used logistic regression models for a wide-ranging species, the marbled murrelet, (Brachyramphus marmoratus) in a large region in California to address how much changing the spatial or temporal scale of...
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Koop, Gerrit; Collar, Carol A.; Toft, Nils
are imperfect tests, particularly lacking sensitivity, which leads to misclassification and thus to biased estimates of odds ratios in risk factor studies. The objective of this study was to evaluate risk factors for the true (latent) IMI status of major pathogens in dairy goats. We used Bayesian logistic......, caprine arthritis encephalitis-virus infection status, and kidding season), and uncontrollable risk factors (parity, lactation stage, milk yield, pregnancy status, and breed) were measured in the Dutch study, the Californian study or in both studies. Bayesian logistic regression models were constructed...... in which the true (but latent) infection status was linked to the joint test results, as functions of test sensitivity and specificity. The latent IMI status was the dependent variable in the logistic regression model with risk factors as independent variables and with random herd and goat effects...
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Jespersen, Per Homann; Drewes, Lise
This paper describes how the freight transport sector is influenced by logistical principles of production and distribution. It introduces new ways of understanding freight transport as an integrated part of the changing trends of mobility. By introducing a conceptual model for understanding...... the interaction between logistics and transport, it points at ways to over-come inherent methodological difficulties when studying this relation...
Bendall, Jason C; Simpson, Paul M; Middleton, Paul M
To determine whether vital signs in patients suffering from acute pain in the out-of-hospital setting have any association with pain severity measured using an ordinal pain scale. We conducted a retrospective analysis of over 53 000 adult patients aged between 16 and 100 years, who presented to paramedics complaining of acute pain between 1 January 2004 and 30 November 2006. Simple correlation (Spearman's) and ordinal logistic regression techniques were used to create a proportional odds model to explore the relationship between patient-reported pain score and initial vital signs including respiratory rate, pulse rate and blood pressure. There was a weak but significant correlation between respiratory rate and initial pain score (R=0.15, Plogistic regression. In adults, a respiratory rate of 25 breaths/min or more was the most important predictor of having more severe pain. Tachycardia and systolic hypertension may also be important in younger and older patients, respectively. Simple correlation fails to show clinically important associations between prehospital vital signs and pain severity.
Tate, Richard L.
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Steinbuch, Luc; Brus, Dick J.; Heuvelink, Gerard B.M.
One of the first soil forming processes in marine and fluviatile clay soils is ripening, the irreversible change of physical and chemical soil properties, especially consistency, under influence of air. We used Bayesian binomial logistic regression (BBLR) to update the map showing unripened
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Valenta, Zdeněk; Pitha, J.; Poledne, R.
Roč. 25, č. 24 (2006), s. 4227-4234 ISSN 0277-6715 R&D Projects: GA MZd NA7512 Institutional research plan: CEZ:AV0Z10300504 Keywords : proportional odds logistic regression * dichotomized outcomes * uncertainty Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.737, year: 2006
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Courtney, Jon R.; Prophet, Retta
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Atar, Burcu; Kamata, Akihito
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K; Bottomley, Andrew; de Graeff, Alexander; Groenvold, Mogens; Gundy, Chad; Koller, Michael; Petersen, Morten A; Sprangers, Mirjam A G
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application. A review of logistic regression DIF analyses in HRQoL was undertaken. Methodological articles from other fields and using other DIF methods were also included if considered relevant. There are many competing approaches for the conduct of DIF analyses and many criteria for determining what constitutes significant DIF. DIF in short scales, as commonly found in HRQL instruments, may be more difficult to interpret. Qualitative methods may aid interpretation of such DIF analyses. A number of methodological choices must be made when applying logistic regression for DIF analyses, and many of these affect the results. We provide recommendations based on reviewing the current evidence. Although the focus is on logistic regression, many of our results should be applicable to DIF analyses in general. There is a need for more empirical and theoretical work in this area.
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
Pedersen, Bjørn Panella; Ifrim, Georgiana; Liboriussen, Poul
Abstract Background Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well...
.... A generalized estimation equations (GEE) logistic regression model was used for the modeling. A shared trait is defined for two discrete traits based upon explicit patterns of trait concordance and discordance within twin pairs...
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Kaengthong, Nattacha; Domthong, Uthumporn
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
Bhowmik, K.R.; Islam, S.
Logistic regression (LR) analysis is the most common statistical methodology to find out the determinants of childhood mortality. However, the significant predictors cannot be ranked according to their influence on the response variable. Multiple classification (MC) analysis can be applied to identify the significant predictors with a priority index which helps to rank the predictors. The main objective of the study is to find the socio-demographic determinants of childhood mortality at neonatal, post-neonatal, and post-infant period by fitting LR model as well as to rank those through MC analysis. The study is conducted using the data of Bangladesh Demographic and Health Survey 2007 where birth and death information of children were collected from their mothers. Three dichotomous response variables are constructed from children age at death to fit the LR and MC models. Socio-economic and demographic variables significantly associated with the response variables separately are considered in LR and MC analyses. Both the LR and MC models identified the same significant predictors for specific childhood mortality. For both the neonatal and child mortality, biological factors of children, regional settings, and parents socio-economic status are found as 1st, 2nd, and 3rd significant groups of predictors respectively. Mother education and household environment are detected as major significant predictors of post-neonatal mortality. This study shows that MC analysis with or without LR analysis can be applied to detect determinants with rank which help the policy makers taking initiatives on a priority basis. (author)
Cawley, Gavin C; Talbot, Nicola L C
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than
Singh, Abhyudai; Hespanha, Joao P
..., which we refer to as the moment closure function. In this paper, a systematic procedure for constructing moment closure functions of arbitrary order is presented for the stochastic logistic model...
TROXLER, STEVEN; LALONDE, TRENT; WILSON, JEFFREY R.
The use of logistic models for independent binary data has relied first on asymptotic theory and later on exact distributions for small samples. However, the use of logistic models for dependent analysis based on exact analysis is not as common. Moreover attention is usually given to one-stage clustering. In this paper we extend the exact techniques to address hypothesis testing (estimation is not addressed) for data with second-stage and probably higher levels of clustering. The methods are ...
Almquist, Zack W; Butts, Carter T
Change in group size and composition has long been an important area of research in the social sciences. Similarly, interest in interaction dynamics has a long history in sociology and social psychology. However, the effects of endogenous group change on interaction dynamics are a surprisingly understudied area. One way to explore these relationships is through social network models. Network dynamics may be viewed as a process of change in the edge structure of a network, in the vertex set on which edges are defined, or in both simultaneously. Although early studies of such processes were primarily descriptive, recent work on this topic has increasingly turned to formal statistical models. Although showing great promise, many of these modern dynamic models are computationally intensive and scale very poorly in the size of the network under study and/or the number of time points considered. Likewise, currently used models focus on edge dynamics, with little support for endogenously changing vertex sets. Here, the authors show how an existing approach based on logistic network regression can be extended to serve as a highly scalable framework for modeling large networks with dynamic vertex sets. The authors place this approach within a general dynamic exponential family (exponential-family random graph modeling) context, clarifying the assumptions underlying the framework (and providing a clear path for extensions), and they show how model assessment methods for cross-sectional networks can be extended to the dynamic case. Finally, the authors illustrate this approach on a classic data set involving interactions among windsurfers on a California beach.
Fernández, Leónides; Mediano, Pilar; García, Ricardo; Rodríguez, Juan M; Marín, María
Objectives Lactational mastitis frequently leads to a premature abandonment of breastfeeding; its development has been associated with several risk factors. This study aims to use a decision tree (DT) approach to establish the main risk factors involved in mastitis and to compare its performance for predicting this condition with a stepwise logistic regression (LR) model. Methods Data from 368 cases (breastfeeding women with mastitis) and 148 controls were collected by a questionnaire about risk factors related to medical history of mother and infant, pregnancy, delivery, postpartum, and breastfeeding practices. The performance of the DT and LR analyses was compared using the area under the receiver operating characteristic (ROC) curve. Sensitivity, specificity and accuracy of both models were calculated. Results Cracked nipples, antibiotics and antifungal drugs during breastfeeding, infant age, breast pumps, familial history of mastitis and throat infection were significant risk factors associated with mastitis in both analyses. Bottle-feeding and milk supply were related to mastitis for certain subgroups in the DT model. The areas under the ROC curves were similar for LR and DT models (0.870 and 0.835, respectively). The LR model had better classification accuracy and sensitivity than the DT model, but the last one presented better specificity at the optimal threshold of each curve. Conclusions The DT and LR models constitute useful and complementary analytical tools to assess the risk of lactational infectious mastitis. The DT approach identifies high-risk subpopulations that need specific mastitis prevention programs and, therefore, it could be used to make the most of public health resources.
Zumbo, Bruno D.; Ochieng, Charles O.
Many measures found in educational research are ordered categorical response variables that are empirical realizations of an underlying normally distributed variate. These ordered categorical variables are commonly referred to as Likert or rating scale data. Regression models are commonly fit using these ordered categorical variables as the…
Baghaei, Purya; Kubinger, Klaus D.
The present paper gives a general introduction to the linear logistic test model (Fischer, 1973), an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014) functions to estimate the model and interpret its parameters. The…
Full Text Available The exact calculation of logistics costs has become a real challenge in logistics and supply chain management. It is essential to gain reliable and accurate costing information to attain efficient resource allocation within the logistics service provider companies. Traditional costing approaches, however, may not be sufficient to reach this aim in case of complex and heterogeneous logistics service structures. So this paper intends to explore the ways of improving the cost calculation regimes of logistics service providers and show how to adopt the multi-level full cost allocation technique in logistics practice. After determining the methodological framework, a sample cost calculation scheme is developed and tested by using estimated input data. Based on the theoretical findings and the experiences of the pilot project it can be concluded that the improved costing model contributes to making logistics costing more accurate and transparent. Moreover, the relations between costs and performances also become more visible, which enhances the effectiveness of logistics planning and controlling significantly
Yasuda, Hiroshi; Yoshida, Kazuya; Segawa, Mitsuru; Tokuda, Ryoichi; Tsutsui, Toyoharu; Yasuda, Yuichi; Magara, Shunichi
The objective of this metallomics study is to investigate comprehensively some relationships between cancer risk and minerals, including essential and toxic metals. Twenty-four minerals including essential and toxic metals in scalp hair samples from 124 solid-cancer patients and 86 control subjects were measured with inductively coupled plasma mass spectrometry (ICP-MS), and the association of cancer with minerals was statistically analyzed with multiple logistic regression analysis. Multiple logistic regression analysis demonstrated that several minerals are significantly correlated to cancer, positively or inversely. The most cancer-correlated mineral was iodine (I) with the highest correlation coefficient of r = 0.301, followed by arsenic (As; r = 0.267), zinc (Zn; r = 0.261), and sodium (Na; r = 0.190), with p r = -0.161, p r = -0.128). Multiple linear regression value was highly significantly correlated with probability of cancer (R (2) = 0.437, p logistic regression analysis is a useful tool for estimating cancer risk.
Lewis, Kristin Nicole; Heckman, Bernadette Davantes; Himawan, Lina
Growth mixture modeling (GMM) identified latent groups based on treatment outcome trajectories of headache disability measures in patients in headache subspecialty treatment clinics. Using a longitudinal design, 219 patients in headache subspecialty clinics in 4 large cities throughout Ohio provided data on their headache disability at pretreatment and 3 follow-up assessments. GMM identified 3 treatment outcome trajectory groups: (1) patients who initiated treatment with elevated disability levels and who reported statistically significant reductions in headache disability (high-disability improvers; 11%); (2) patients who initiated treatment with elevated disability but who reported no reductions in disability (high-disability nonimprovers; 34%); and (3) patients who initiated treatment with moderate disability and who reported statistically significant reductions in headache disability (moderate-disability improvers; 55%). Based on the final multinomial logistic regression model, a dichotomized treatment appointment attendance variable was a statistically significant predictor for differentiating high-disability improvers from high-disability nonimprovers. Three-fourths of patients who initiated treatment with elevated disability levels did not report reductions in disability after 5 months of treatment with new preventive pharmacotherapies. Preventive headache agents may be most efficacious for patients with moderate levels of disability and for patients with high disability levels who attend all treatment appointments. Copyright © 2011 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Huang, Mian; Li, Runze; Wang, Shaoli
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
McLaren, Christine E; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
Dynamic contrast-enhanced magnetic resonance imaging is a clinical imaging modality for the detection and diagnosis of breast lesions. Analytic methods were compared for diagnostic feature selection and the performance of lesion classification to differentiate between malignant and benign lesions in patients. The study included 43 malignant and 28 benign histologically proved lesions. Eight morphologic parameters, 10 gray-level co-occurrence matrix texture features, and 14 Laws texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for the selection of the best predictors of malignant lesions among the normalized features. Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with an area under the receiver-operating characteristic curve (AUC) of 0.82 and accuracy of 0.76. The diagnostic performance of these four features computed on the basis of logistic regression yielded an AUC of 0.80 (95% confidence interval [CI], 0.688-0.905), similar to that of ANN. The analysis also showed that the odds of a malignant lesion decreased by 48% (95% CI, 25%-92%) for every increase of 1 standard deviation in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model composed of compactness, normalized radial length entropy, and gray-level sum average was selected, and it had the highest overall accuracy, 0.75, among all models, with an AUC of 0.77 (95% CI, 0.660-0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors compactness and Law_LS had an AUC of 0.79 (95% CI, 0.672-0.898). The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive
S. G. Grigoryev
Full Text Available Diagnostics, equally with prevention and treatment, is a basis of medical science and practice. For its history the medicine has accumulated a great variety of diagnostic methods for different diseases and pathologic conditions. Nevertheless, new tests, methods and tools are being developed and recommended to application nowadays. Such indicators as sensitivity and specificity which are defined on the basis of fourfold contingency tables construction or ROC-analysis method with ROC – curve modelling (Receiver operating characteristic are used as the methods to estimate the diagnostic capability. Fourfold table is used with the purpose to estimate the method which confirms or denies the diagnosis, i.e. a quality indicator. ROC-curve, being a graph, allows making the estimation of model quality by subdivision of two classes on the basis of identifying the point of cutting off a continuous or discrete quantitative attribute.The method of logistic regression technique is introduced as a tool to develop some mathematical-statistical forecasting model of probability of the event the researcher is interested in if there are two possible variants of the outcome. The method of ROC-analysis is chosen and described in detail as a tool to estimate the model quality. The capabilities of the named methods are demonstrated by a real example of creation and efficiency estimation (sensitivity and specificity of a forecasting model of probability of complication development in the form of pyodermatitis in children with atopic dermatitis.
Zeng, Cheng; Yu, Lei; Chen, Yu; Bian, Hong-Qiang; Zheng, Kai; Ye, Guo-Gang
To investigate the high-risk factors for neonatal incarcerated hernia with intestinal necrosis by logistic regression analysis. Retrospective analysis was performed on the clinical data of 131 neonates with incarcerated oblique inguinal hernia containing the intestine. Of the 131 cases, 14 suffered from intestinal necrosis. The high risk factors for neonatal incarcerated hernia with intestinal necrosis were determined by logistic regression analysis. Manual reduction after incarceration (>2 times) (χ2 = 69.289, P2 times) (χ2 = 84.731, Pneonatal incarcerated hernia with intestinal necrosis. Intestinal necrosis tends to occur in neonates with incarcerated hernia who have incarceration or received manual reduction more than twice and suffer from mesentery incarceration. Manual reduction is prohibited for these cases, which should be surgically treated immediately.
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics , Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....
Scott, Neil W.; Fayers, Peter M.; Aaronson, Neil K.
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....
Alfonso L. Palmer
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
Full Text Available The relationship between financial literacy and financial behavior is important, as individuals are increasingly being asked to take responsibility for their financial wellbeing, especially their retirement. Analyzing of individual savings and attitudes towards retirement planning is important, as these types of investments are a way of preserving security during years of financial vulnerability. Research indicates that individuals who do not save adequately for their retirement, generally have a relatively low level of financial literacy. This research investigates the relationship between financial literacy and retirement planning in Croatia. To analyze the relationship between financial literacy and planning for retirement, maximum likelihood logistic regression analysis was used. The paper shows that those who answer financial literacy questions correctly are more likely to have a positive attitude towards retirement planning and are more likely to save for retirement, ensuring them of higher levels of financial security in retirement. The Goodness-of-Fit evaluation for the estimated logit model was performed using the Andrews and Hosmer-Lemeshow Tests.
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full text: Objective: To assess the diagnostic value of CEA CA199 and CA50 for colorectal neoplasm by logistic regression and ROC curve. Methods: The subjects include 75 patients of colorectal cancer, 35 patients of benign intestinal disease and 49 health controls. CEA CA199 and CA50 are measured by CLIA ECLIA and IRMA respectively. The area under the curve (AUC) of CEA CA 199 CA50 and logistic regression results are compared. [Result] In the cancer-benign group, the AUC of CA50 is larger than the AUC of CA199 Compared with the AUC of combination of CEA CA199 and CA50 (0.604),the AUC of combination of CEA and CA50 (0.875) is larger and it is also larger than any other AUC of CEA CA199 or CA50 alone. In the cancerhealth group, the AUC of combination of CEA CA199 and CA50 is larger than any other AUC of CEA CA199 or CA50 alone. No matter in the cancer-benign group or cancerhealth group. The AUC of CEA is larger than the AUC of CA199 or CA50. Conclusion: CEA is useful in the diagnosis of colorectal cancer. In the process of differential diagnosis, the combination of CEA and CA50 can give more information, while the combination of three tumor markers does not perform well. Furthermore, as a statistical method, logistic regression can improve the diagnostic sensitivity and specificity. (author)
PARAMETRIC AND NON PARAMETRIC (MARS: MULTIVARIATE ADDITIVE REGRESSION SPLINES) LOGISTIC REGRESSIONS FOR PREDICTION OF A DICHOTOMOUS RESPONSE VARIABLE WITH AN EXAMPLE FOR PRESENCE/ABSENCE OF AMPHIBIANS
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Ekner, Line Elvstrøm; Nejstgaard, Emil
We propose a new and simple parametrization of the so-called speed of transition parameter of the logistic smooth transition autoregressive (LSTAR) model. The new parametrization highlights that a consequence of the well-known identification problem of the speed of transition parameter is that th......We propose a new and simple parametrization of the so-called speed of transition parameter of the logistic smooth transition autoregressive (LSTAR) model. The new parametrization highlights that a consequence of the well-known identification problem of the speed of transition parameter...
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
The flow of trade is not equal to transport flows, mainly due to the fact that warehouses and distribution facilities are used as intermediary stops on the way from production locations to the points of consumption or further rework of goods. This thesis proposes a logistics chain model, which
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of
Liu, Yang; Sun, Wei; Ma, Xiaojun; Hao, Yuedong; Liu, Gang; Hu, Xiaohui; Shang, Houlai; Wu, Pengfei; Zhao, Zexue; Liu, Weidong
Osteosarcoma (OS) is the most common histological type of primary bone cancer. The present study was designed to identify the key genes and signaling pathways involved in the metastasis of OS. Microarray data of GSE39055 were downloaded from the Gene Expression Omnibus database, which included 19 OS biopsy specimens before metastasis (control group) and 18 OS biopsy specimens after metastasis (case group). After the differentially expressed genes (DEGs) were identified using the Linear Models for Microarray Analysis package, hierarchical clustering analysis and unsupervised clustering analysis were performed separately, using orange software and the self-organization map method. Based upon the Database for Annotation, Visualization and Integrated Discovery tool and Cytoscape software, enrichment analysis and protein-protein interaction (PPI) network analysis were conducted, respectively. After function deviation scores were calculated for the significantly enriched terms, hierarchical clustering analysis was performed using Cluster 3.0 software. Furthermore, logistic regression analysis was used to identify the terms that were significantly different. Those terms that were significantly different were validated using other independent datasets. There were 840 DEGs in the case group. There were various interactions in the PPI network [including intercellular adhesion molecule-1 (ICAM1), transforming growth factor β1 (TGFB1), TGFB1-platelet-derived growth factor subunit B (PDGFB) and PDGFB-platelet‑derived growth factor receptor-β (PDGFRB)]. Regulation of cell migration, nucleotide excision repair, the Wnt signaling pathway and cell migration were identified as the terms that were significantly different. ICAM1, PDGFB, PDGFRB and TGFB1 were identified to be enriched in cell migration and regulation of cell migration. Nucleotide excision repair and the Wnt signaling pathway were the metastasis-associated pathways of OS. In addition, ICAM1, PDGFB, PDGFRB
Bjørn P Pedersen
Full Text Available BACKGROUND: Structured Logistic Regression (SLR is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem. RESULTS: Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases. CONCLUSIONS: Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.
Yin, Wenjing; Xu, Zhengliang; Sheng, Jiagen; Zhang, Changqing; Zhu, Zhenhong
To evaluate the potential risk factors of the development of femoral head osteonecrosis after healed intertrochanteric fractures. We retrospectively reviewed all patients who were operated upon with closed reduction and internal fixation for intertrochanteric fractures by our medical group from December 1993 to December 2012. Patients with healed fractures were identified. Age, gender, comorbidities favouring osteonecrosis, causes of injuries, fracture patterns, the location of the primary fracture line, time from injury to surgery, fixation methods, and the development of femur head osteonecrosis of these patients were summarised. Univariate and multivariate logistic regression analysis were performed to evaluate the correlation between potential risk factors and the development of femoral head osteonecrosis. A total of 916 patients with healed intertrochanteric fractures were identified. Femoral head osteonecrosis was found in 8 cases (0.87%). According to the results of univariate logistic regression, a more proximal fracture line, fixation with dynamic hip screws and age were found to be statistically significant factors. The results of multivariate logistic regression analysis indicated that the statistically significant predictors of femoral head osteonecrosis were younger age (odds ratio [OR] = 17.103; 95% confidence interval [CI], 1.988-147.111), a more proximal fracture line (OR = 31.439; 95% CI, 3.700-267.119) and applying dynamic hip screw as the internal fixation (OR = 11.114; 95% CI, 2.064-59.854). Regular follow-up is commended in young patients with a proximal fracture line who underwent closed reduction and internal fixation with dynamic hip screw, even though the bone had healed.
Kotrashetti, Vijayalakshmi S; Hollikatti, Kiran; Mallapur, M D; Hallikeremath, Seema R; Kale, Alka D
Palatal rugae patterns are relatively unique to an individual and are well protected by the lips, buccal pad of fat and teeth. They are considered to be stable throughout life following completion of growth, although there is considerable debate on the matter, they can be used successfully in post mortem identification provided an antemortem record exists. Thus the aim of this study was to examine palatal rugae shape among two Indian populations and determine the accuracy in defining the Indian population using logistic regression analysis. The study comprises two groups from geographically different regions of India with basic origin from Maharashtra and Karnataka state. The sample includes 100 plaster cast equally distributed between two populations and genders with age ranging between 18 and 40 years. Impression of maxillary arch was obtained using alginate impression material and plaster cast was made. The rugae was delineated on the cast using a sharp graphite pencil under adequate light and magnification and recorded according to classification given by Kapali et al. and Thomas and Kotze (1983). Chi-Square analysis showed significant difference in wavy, circular and divergent pattern between the two populations. The straight and wavy forms were significant in logistic regression analysis. A predictive value of 71% was obtained in determining the original cases correctly when straight, wavy, curved and circular patterns were assessed. 70% of predictive value was achieved when all rugae patterns were assessed. Mean number of rugae was greater in females compared to males with straight pattern showing statistically significant difference between males and females. Significant difference was recorded among straight, wavy, circular and divergent pattern between two populations. Consequently this study demonstrates moderate accuracy of palatal rugae pattern using logistic regression analysis in identification of Indians. Copyright © 2011 Elsevier Ltd and Faculty
Full Text Available In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows that the conclusion also applies to the probabilities estimated from short subtests of mental abilities and that small samples can yield excellent accuracy. The calculated Bayes probabilities can be used to provide meaningful examinee feedback regardless of whether the test was originally designed to be unidimensional.
Full Text Available In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Na ve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows that the conclusion also applies to the probabilities estimated from short subtests of mental abilities and that small samples can yield excellent accuracy. The calculated Bayes probabilities can be used to provide meaningful examinee feedback regardless of whether the test was originally designed to be unidimensional.
Use of logistic regression with dummy variables for modeling the growth-no growth limits of Saccharomyces cerevisiae IGAL01 as a function of sodium chloride, acid type, and potassium sorbate concentration according to growth media.
Arroyo López, F N; Durán Quintana, M C; Garrido Fernández, A
A global logistic model was used to study the effects of both quantitative variables (NaCl, acid, and potassium sorbate concentrations) and dummy variables (laboratory medium or brine, and citric, lactic, or acetic acids) on growth of Saccharomyces cerevisiae IGAL01. The deduced equations, with the significant coefficients selected by a backward stepwise procedure, allowed estimations of the simultaneous comparison of behaviors of levels of the qualitative variables as a function of the quantitative variables and the development of the growth-no growth limits according to laboratory medium or brine and the different types of acidifying agents. The S. cerevisiae growth region in yeast malt glucose peptone broth was always wider than that in brine, in which this yeast was inhibited by 0.03% potassium sorbate and 6% NaCl, when the acid concentration (regardless of type) was 0.2 to 0.3%. These results demonstrate the applicability of such model designs to include qualitative variables in investigations related to the development of growth-no growth limits.
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire
Utilização de estratificação e modelo de regressão logística na análise de dados de estudos caso-controle Using of stratification and the logistic regression model in the analysis of data of case-control studies
Suely Godoy Agostinho Gimeno
Full Text Available Exemplifica-se a aplicação de análise multivariada, por estratificação e com regressão logística, utilizando dados de um estudo caso-controle sobre câncer de esôfago. Oitenta e cinco casos e 292 controles foram classificados segundo sexo, idade e os hábitos de beber e de fumar. As estimativas por ponto dos odds ratios foram semelhantes, sendo as duas técnicas consideradas complementares.Data of a case-control study of esophageal cancer were used as an example of the use of multivariate analysis with stratification and logistic regression. Eighty-five cases and 292 controls were classified according to sex, age and smoking and drinking habits. The point estimates of the odds ratios were similar, and the techniques were considered complementary.
Al-Mudhafar, W. J.
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly
Xiong, Yihui; Zuo, Renguang
Mineralization is a special type of singularity event, and can be considered as a rare event, because within a specific study area the number of prospective locations (1s) are considerably fewer than the number of non-prospective locations (0s). In this study, GIS-based rare events logistic regression (RELR) was used to map the mineral prospectivity in the southwestern Fujian Province, China. An odds ratio was used to measure the relative importance of the evidence variables with respect to mineralization. The results suggest that formations, granites, and skarn alterations, followed by faults and aeromagnetic anomaly are the most important indicators for the formation of Fe-related mineralization in the study area. The prediction rate and the area under the curve (AUC) values show that areas with higher probability have a strong spatial relationship with the known mineral deposits. Comparing the results with original logistic regression (OLR) demonstrates that the GIS-based RELR performs better than OLR. The prospectivity map obtained in this study benefits the search for skarn Fe-related mineralization in the study area.
Kayano, Mitsunori; Kataoka, Tomoko
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (Pketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (Pketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.
Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).
Verachtert, E.; Den Eeckhaut, M. Van; Poesen, J.; Govers, G.; Deckers, J.
Soil piping (tunnel erosion) has been recognised as an important erosion process in collapsible loess-derived soils of temperate humid climates, which can cause collapse of the topsoil and formation of discontinuous gullies. Information about the spatial patterns of collapsed pipes and regional models describing these patterns is still limited. Therefore, this study aims at better understanding the factors controlling the spatial distribution and predicting pipe collapse. A dataset with parcels suffering from collapsed pipes (n = 560) and parcels without collapsed pipes was obtained through a regional survey in a 236 km² study area in the Flemish Ardennes (Belgium). Logistic regression was applied to find the best model describing the relationship between the presence/absence of a collapsed pipe and a set of independent explanatory variables (i.e. slope gradient, drainage area, distance-to-thalweg, curvature, aspect, soil type and lithology). Special attention was paid to the selection procedure of the grid cells without collapsed pipes. Apart from the first piping susceptibility map created by logistic regression modelling, a second map was made based on topographical thresholds of slope gradient and upslope drainage area. The logistic regression model allowed identification of the most important factors controlling pipe collapse. Pipes are much more likely to occur when a topographical threshold depending on both slope gradient and upslope area is exceeded in zones with a sufficient water supply (due to topographical convergence and/or the presence of a clay-rich lithology). On the other hand, the use of slope-area thresholds only results in reasonable predictions of piping susceptibility, with minimum information.
Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A
Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.
Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of eHealth behavior. After controlling for college
Perez, Ivan; Chavez, Allison K.; Ponce, Dario
Background: The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. Objective: The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. Materials and Methods: The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. Results: The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. Conclusion: The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed. PMID:27555732
Perez, Ivan; Chavez, Allison K; Ponce, Dario
The Ricketts' posteroanterior (PA) cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination. The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years) in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed.
Camilleri, Liberato; Cefai, Carmel
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Tarasova, Valentina V.; Tarasov, Vasily E.
A generalization of the economic model of logistic growth, which takes into account the effects of memory and crises, is suggested. Memory effect means that the economic factors and parameters at any given time depend not only on their values at that time, but also on their values at previous times. For the mathematical description of the memory effects, we use the theory of derivatives of non-integer order. Crises are considered as sharp splashes (bursts) of the price, which are mathematically described by the delta-functions. Using the equivalence of fractional differential equations and the Volterra integral equations, we obtain discrete maps with memory that are exact discrete analogs of fractional differential equations of economic processes. We derive logistic map with memory, its generalizations, and “economic” discrete maps with memory from the fractional differential equations, which describe the economic natural growth with competition, power-law memory and crises.
Logistics. Reaching NEHU campus, Shillong: Shillong is 125 km away from the Guwahati airport and 100 km from Guwahati railway station. The Local Organizing Committee (LOC) will arrange transportation from, both, the Guwahati airport and the Shillong Airport to NEHU campus, Shillong. Participants arriving at either ...
Raquel Nogueira Avelar Silva
Full Text Available The objective of the present study was to identify the factors related to “peripheral vascular trauma” in children aged six months to 12 years. This prospective cohort study included children with peripheral vein punctured for the first time per side and excluded those with high/complete healing of trauma signs after removing the catheter. Daily clinical evaluations were performed in intervals shorter than 24 hours. Data were treated according to Pearson’s test and the logistic regression method. Among the 14 variables considered intervenient, four were statistically associated to the occurrence of trauma: dirtiness and humidity in the catheter insertion site, catheter caliber, and age. A causal relationship was found between the intervenient variables and the outcome, “peripheral vascular trauma”, thus, contributing to forming the knowledge of the peripheral venous puncture in children aged six months to 12 years. Descriptors: Child; Nursing Diagnosis; Veins; Injuries.
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Bersabé, Rosa; Rivas, Teresa
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Lin, Qing; Xie, Qiu-You; He, Yan-Bin; Chen, Yan; Ni, Xiao-Xiao; Guo, Ye-Qun; Shen, Yan; Yu, Rong-Hao
To explore the factors that affect the recovery of consciousness in patients with disorders of consciousness following brain trauma. We analyzed the data of 114 patients with disorders of consciousness following brain trauma admitted for rehabilitation. Bilateral logistic regression analysis was used to explore the factors that affected the recovery of the patients' consciousness. A logistic regression model was established and the ROC curve was drawn to obtain the optimal threshold of the prognostic model. Univariate analysis showed that vegetative state duration (PR scores (PLogistic multivariate analysis showed that central fever (OR=3.493, P=0.044), vegetative state duration (OR=1.016, P=0.008), PSH (OR=4.223, P=0.034) and CRS-R scores (OR=0.640, P=0.002) all significantly affected the recovery of consciousness. The χ 2 value of the Hosmer-Lemeshow test was 10.214 (P=0.250), and the goodness of fit of this model indicated an outstanding fitting (c=0.91). The presence of PSH is the one of the most important factor followed by centric fever to affect the outcome of patients with disorders of consciousness. A lower CRS-R score and a longer duration of vegetative state also predict a poor recovery of consciousness in these patients.
Full Text Available This article proposes a new model to address the design problem of a sustainable regional logistics network with uncertainty in future logistics demand. In the proposed model, the future logistics demand is assumed to be a random variable with a given probability distribution. A set of chance constraints with regard to logistics service capacity and environmental impacts is incorporated to consider the sustainability of logistics network design. The proposed model is formulated as a two-stage robust optimization problem. The first-stage problem before the realization of future logistics demand aims to minimize a risk-averse objective by determining the optimal location and size of logistics parks with CO2 emission taxes consideration. The second stage after the uncertain logistics demand has been determined is a scenario-based stochastic logistics service route choices equilibrium problem. A heuristic solution algorithm, which is a combination of penalty function method, genetic algorithm, and Gauss–Seidel decomposition approach, is developed to solve the proposed model. An illustrative example is given to show the application of the proposed model and solution algorithm. The findings show that total social welfare of the logistics system depends very much on the level of uncertainty in future logistics demand, capital budget for logistics parks, and confidence levels of the chance constraints.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Jiménez-Huete, Adolfo; Riva, Elena; Toledano, Rafael; Campo, Pablo; Esteban, Jesús; Barrio, Antonio Del; Franch, Oriol
The validity of neuropsychological tests for the differential diagnosis of degenerative dementias may depend on the clinical context. We constructed a series of logistic models taking into account this factor. We retrospectively analyzed the demographic and neuropsychological data of 301 patients with probable Alzheimer's disease (AD), frontotemporal degeneration (FTLD), or dementia with Lewy bodies (DLB). Nine models were constructed taking into account the diagnostic question (eg, AD vs DLB) and subpopulation (incident vs prevalent). The AD versus DLB model for all patients, including memory recovery and phonological fluency, was highly accurate (area under the curve = 0.919, sensitivity = 90%, and specificity = 80%). The results were comparable in incident and prevalent cases. The FTLD versus AD and DLB versus FTLD models were both inaccurate. The models constructed from basic neuropsychological variables allowed an accurate differential diagnosis of AD versus DLB but not of FTLD versus AD or DLB. © The Author(s) 2014.
Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.
Logistic regression analysis of conventional ultrasonography, strain elastosonography, and contrast-enhanced ultrasound characteristics for the differentiation of benign and malignant thyroid nodules.
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Wulifan, Joseph K; Jahn, Albrecht; Hien, Hervé; Ilboudo, Patrick Christian; Meda, Nicolas; Robyn, Paul Jacob; Saidou Hamadou, T; Haidara, Ousmane; De Allegri, Manuela
Unmet need for family planning has implications for women and their families, such as unsafe abortion, physical abuse, and poor maternal health. Contraceptive knowledge has increased across low-income settings, yet unmet need remains high with little information on the factors explaining it. This study assessed factors associated with unmet need among pregnant women in rural Burkina Faso. We collected data on pregnant women through a population-based survey conducted in 24 rural districts between October 2013 and March 2014. Multivariate multilevel logistic regression was used to assess the association between unmet need for family planning and a selection of relevant demand- and supply-side factors. Of the 1309 pregnant women covered in the survey, 239 (18.26%) reported experiencing unmet need for family planning. Pregnant women with more than three living children [OR = 1.80; 95% CI (1.11-2.91)], those with a child younger than 1 year [OR = 1.75; 95% CI (1.04-2.97)], pregnant women whose partners disapproves contraceptive use [OR = 1.51; 95% CI (1.03-2.21)] and women who desired fewer children compared to their partners preferred number of children [OR = 1.907; 95% CI (1.361-2.672)] were significantly more likely to experience unmet need for family planning, while health staff training in family planning logistics management (OR = 0.46; 95% CI (0.24-0.73)] was associated with a lower probability of experiencing unmet need for family planning. Findings suggest the need to strengthen family planning interventions in Burkina Faso to ensure greater uptake of contraceptive use and thus reduce unmet need for family planning.
Huang, W; Tu, S
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness of 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our
Huang, W; Tu, S [Chang Gung University, Kwei-shan, Tao-Yuan, Taiwan (China)
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness of 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our
DeBari, Vincent A
It has been demonstrated that decision levels (DL) and their confidence intervals (CI) can be estimated from the second derivative, f '' (P), of the logistic regression probability curve (LRPC). Although this method generally provides smooth curves from which DL and CI can be obtained, there are datasets that generate "noisy" curves making these measurements difficult. The purpose of this study was to develop a procedure to obviate this noise, thus allowing the more facile estimation of DL and CI. Data from two clinical studies were examined. Logistic regression analysis was performed and the first derivatives, f ' (P), were fitted to Gaussian models. The derivatives of these surrogate f ' (P) were generated to provide f '' (P) and were compared with data from receiver operating characteristic (ROC) curves. For both sets of data, the surrogate curves demonstrated strong fits to the natural f ' (P) with r(2) = 0.986 for one study and 0.832 for the second. The f '' (P) generated from the surrogate curves demonstrated single maxima (M) and minima (m), compared with the f '' (P) generated from the natural f ' (P) in which multiple M and m were observed. Easily discernible DL and CI were observed for both datasets with differences from ROC-estimated DL of 1.7% for the first study and 4.8% for the second. The use of a surrogate Gaussian simulation of f ' (P) may be a useful alternative to natural f ' (P) when using the f '' (P) of the LRPC to determine DL and CI.
Sikorska, Karolina; Lesaffre, Emmanuel; Groenen, Patrick F J; Eilers, Paul H C
Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code. We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures. We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access.
Danilo A. López-Sarmiento
Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.
Jin, Ru-Feng; Wan, Jun-Xiang; Gu, Shou-Yong; Sun, Pin; Zhang, Zhong-Bin; Jin, Xi-Peng; Xia, Zhao-Lin
To explore the association of polymorphisms of metabolizing enzyme genes with chronic benzene poisoning (CBP) comprehensively by case-control design. 152 CBP patients and 152 workers occupationally exposed to benzene without poisoning manifestations were investigated. 30 single nucleotide polymorphisms (SNPs) in 13 genes such as CYP2E1 were tested by PCR-RFLP, sequencing approaches. Logistic regression model was used to detect main effects and 2-order interaction effects of gene and/or environment. Multifactor dimensionality reduction (MDR) was used to detect high-order gene-gene or gene-environment interactions. Based on logistic regression, the main effects of GSTP1 rs947894, EPHX1 rs1051740, CYP1A1 rs4646903, CYP2D6 rs1065852 and rs1135840 were found to be significant (P 0.05). The other SNPs did not show any significant associations with CBP. According to MDR, a 3-order interaction with the strongest combined effect was found, i.e. the 3-factor combination of CYP1A1 rs4646903, CYP2D6 rs1065852 and CYP2D6 rs1135840. Gene-gene, gene-environment interactions are important mechanism to genetic susceptibility of CBP.
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Öğr. Gör. Rukiye NUMAN TEKİN
Full Text Available Length of stay (LOS has important implications in various aspects of health services, can vary according to a wide range of factors. It is noticed that LOS has been neglected mostly in both theoratical studies and practice of health care management in Turkey. The main purpose of this study is to identify factors related to LOS in Turkey. A retrospective analysis of 2.255.836 patients hospitalized to private, university, foundation university and other (municipality, association and foreigners/minority hospitals hospitals which have an agreement with Social Security Institution (SSI in Turkey, from January 1, 2010, until the December 31, 2010, was examined. Patient’s data were taken from MEDULA (National Electronic Invoice System and SPSS 18.0 was used to perform statistical analysis. In this study t-test, one way anova and multinomial logistic regression are used to determine variables that may affect to LOS. The average LOS of patients was 3,93 days (SD = 5,882. LOS showed a statistically significant difference according to all independent variables used in the study (age, gender, disease class, type of hospitalization, presence of comorbidity, type and number of surgery, season of hospitalization, hospital ownership/bed capacity/ geographical region/residential area/type of service. According to the results of the multinomial lojistic regression analysis, LOS was negatively affected in terms of gender, presence of comorbidity, geographical region of hospital and was positively affected in terms of age, season of hospitalization, hospital bed capacity/ ownership/type of service/residential area.
Faradmal, Javad; Soltanian, Ali Reza; Roshanaei, Ghodratollah; Khodabakhshi, Reza; Kasaeian, Amir
Breast cancer is the most common cancers in female populations. The exact cause is not known, but is most likely to be a combination of genetic and environmental factors. Log-logistic model (LLM) is applied as a statistical method for predicting survival and it influencing factors. In recent decades, artificial neural network (ANN) models have been increasingly applied to predict survival data. The present research was conducted to compare log-logistic regression and artificial neural network models in prediction of breast cancer (BC) survival. A historical cohort study was established with 104 patients suffering from BC from 1997 to 2005. To compare the ANN and LLM in our setting, we used the estimated areas under the receiver-operating characteristic (ROC) curve (AUC) and integrated AUC (iAUC). The data were analyzed using R statistical software. The AUC for the first, second and third years after diagnosis are 0.918, 0.780 and 0.800 in ANN, and 0.834, 0.733 and 0.616 in LLM, respectively. The mean AUC for ANN was statistically higher than that of the LLM (0.845 vs. 0.744). Hence, this study showed a significant difference between the performance in terms of prediction by ANN and LLM. This study demonstrated that the ability of prediction with ANN was higher than with the LLM model. Thus, the use of ANN method for prediction of survival in field of breast cancer is suggested.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (PBI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Otok, Bambang Widjanarko; Aisyah, Amalia; Purhadi, Andari, Shofi
Diabetes Mellitus (DM) is a group of metabolic diseases with characteristics shows an abnormal blood glucose level occurring due to pancreatic insulin deficiency, decreased insulin effectiveness or both. The report from the ministry of health shows that DMs prevalence data of East Java province is 2.1%, while the DMs prevalence of Indonesia is only 1,5%. Given the high cases of DM in East Java, it needs the preventive action to control factors causing the complication of DM. This study aims to determine the combination factors causing the complication of DM to reduce the bias by confounding variables using Propensity Score Matching (PSM) with the method of propensity score estimation is binary logistic regression. The data used in this study is the medical record from As-Shafa clinic consisting of 6 covariates and health complication as response variable. The result of PSM analysis showed that there are 22 of 126 DMs patients attending gymnastics paired with patients who didnt attend to diabetes gymnastics. The Average Treatment of Treated (ATT) estimation results showed that the more patients who didnt attend to gymnastics, the more likely the risk for the patients having DMs complications.
Durif, Ghislain; Modolo, Laurent; Michaelsson, Jakob; Mold, Jeff E; Lambert-Lacroix, Sophie; Picard, Franck
The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which combined constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to unstable and non convergent methods due to inappropriate computational frameworks. We hereby propose a computationally stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS). We start by proposing a new solution for the sparse PLS problem that is based on proximal operators for the case of univariate responses. Then we develop an adaptive version of the sparse PLS for classification, called logit-SPLS, which combines iterative optimization of logistic regression and sparse PLS to ensure computational convergence and stability. Our results are confirmed on synthetic and experimental data. In particular, we show how crucial convergence and stability can be when cross-validation is involved for calibration purposes. Using gene expression data, we explore the prediction of breast cancer relapse. We also propose a multicategorial version of our method, used to predict cell-types based on single-cell expression data. Our approach is implemented in the plsgenomics R-package. firstname.lastname@example.org. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com
Yao, Kui-Wu; He, Qing-Yong; Teng, Fei; Wang, Jie
To explore the correlation between common syndrome essential factors and the symptoms and signs of unstable angina (UA). Eight hundred and fifteen patients with UA confirmed by coronary angiography were identified from several centers. Common syndrome essential factors were selected on the basis of expert experience. The correlations between common syndrome essential factors and symptoms and signs of UA were analyzed using binary logistic regression analysis. The common syndrome essential factors in unstable angina were blood stasis, qi stagnation, phlegm turbidity, heat stagnancy, qi deficiency, yin deficiency, and yang deficiency. Symptoms such as chest pain, hypochondriac distention, ecchymosis, dark orbits, dark and purplish tongue, and tongue with ecchymosis and petechiae were significant diagnostic features of "blood stasis". Aversion to cold and cool limbs, weakness in the waist and knees, and clear abundant urine were significant diagnostic features of"yang deficiency". These results were in accordance with the understanding of traditional clinical Chinese medical practice. This clinical study analyzed the correlations between common syndrome essential factors and the symptoms and signs of unstable angina. The results provide the basis for establishing diagnostic criteria for syndrome essential factors.
Gomes, Daniel de Souza; Baptista Filho, Benedito; Oliveira, Fabio Branco de, E-mail: firstname.lastname@example.org, E-mail: email@example.com, E-mail: firstname.lastname@example.org [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Giovedi, Claudia, E-mail: email@example.com [Universidade de Sao Paulo (POLI/USP), Sao Paulo, SP (Brazil). Lab. de Analise, Avaliacao e Gerenciamento de Risco
A reactivity-initiated Accident (RIA) is a disastrous failure, which occurs because of an unexpected rise in the fission rate and reactor power. This sudden increase in the reactor power may activate processes that might lead to the failure of fuel cladding. In severe accidents, a disruption of fuel and core melting can occur. The purpose of the present research is to study the patterns of such accidents using exploratory data analysis techniques. A study based on applied statistics was used for simulations. Then, we chose peak enthalpy, pulse width, burnup, fission gas release, and the oxidation of zirconium as input parameters and set the safety boundary conditions. This new approach includes the logistic regression. With this, the present research aims also to develop the ability to identify the conditions and the probability of failures. Zirconium-based alloys fabricating the cladding of the fuel rod elements with niobium 1% were analyzed for high burnup limits at 65 MWd/kgU. The data based on six decades of investigations from experimental programs. In test, perform in American reactors such as the transient reactor test (TREAT), and power Burst Facility (PBF). In experiments realized in Japanese program at nuclear in the safety research reactor (NSRR), and in Kazakhstan as impulse graphite reactor (IGR). The database obtained from the tests and served as a support for our study. (author)
Glauco H.S. Mendes
Full Text Available Critical success factors in new product development (NPD in the Brazilian small and medium enterprises (SMEs are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.
Lindsay M. Veazey
Full Text Available Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3% for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence” threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i.
Mirjam J Knol
Full Text Available BACKGROUND: In randomized controlled trials (RCTs, the odds ratio (OR can substantially overestimate the risk ratio (RR if the incidence of the outcome is over 10%. This study determined the frequency of use of ORs, the frequency of overestimation of the OR as compared with its accompanying RR in published RCTs, and we assessed how often regression models that calculate RRs were used. METHODS: We included 288 RCTs published in 2008 in five major general medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, New England Journal of Medicine. If an OR was reported, we calculated the corresponding RR, and we calculated the percentage of overestimation by using the formula . RESULTS: Of 193 RCTs with a dichotomous primary outcome, 24 (12.4% presented a crude and/or adjusted OR for the primary outcome. In five RCTs (2.6%, the OR differed more than 100% from its accompanying RR on the log scale. Forty-one of all included RCTs (n = 288; 14.2% presented ORs for other outcomes, or for subgroup analyses. Nineteen of these RCTs (6.6% had at least one OR that deviated more than 100% from its accompanying RR on the log scale. Of 53 RCTs that adjusted for baseline variables, 15 used logistic regression. Alternative methods to estimate RRs were only used in four RCTs. CONCLUSION: ORs and logistic regression are often used in RCTs and in many articles the OR did not approximate the RR. Although the authors did not explicitly misinterpret these ORs as RRs, misinterpretation by readers can seriously affect treatment decisions and policy making.
Shao, Kang; Cheng, Feng
In recent years, the development of electronic commerce in our country to speed up the pace. The role of logistics has been highlighted, more and more electronic commerce enterprise are beginning to realize the importance of logistics in the success or failure of the enterprise. In this paper, the author take Jingdong Mall for example, performing a SWOT analysis of their current situation of self-built logistics system, find out the problems existing in the current Jingdong Mall logistics distribution and give appropriate recommendations.
González, Andrés; Terasvirta, Timo; Dijk, Dick van
models to the panel context. The strategy consists of model specification based on homogeneity tests, parameter estimation, and model evaluation, including tests of parameter constancy and no remaining heterogeneity. The model is applied to describing firms' investment decisions in the presence...
Full Text Available In Denpasar the capital of Bali Province, motorcycle accident contributes to about 80% of total road accidents. Out of those motorcycle accidents, 32% are fatal accidents. This study investigates the influence of accident related factors on motorcycle fatal accidents in the city of Denpasar during period 2006-2008 using a logistic regression model. The study found that the fatality of collision with pedestrians and right angle accidents were respectively about 0.44 and 0.40 times lower than collision with other vehicles and accidents due to other factors. In contrast, the odds that a motorcycle accident will be fatal due to collision with heavy and light vehicles were 1.67 times more likely than with other motorcycles. Collision with pedestrians, right angle accidents, and heavy and light vehicles were respectively accounted for 31%, 29%, and 63% of motorcycle fatal accidents.
Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...
Tabanelli, Giulia; Montanari, Chiara; Patrignani, Francesca; Siroli, Lorenzo; Lanciotti, Rosalba; Gardini, Fausto
The antimicrobial effects of 2 terpenes (citral and linalool) on a Saccharomyces cerevisiae strain isolated from spoiled soft drink have been evaluated, alone or in combination, in relation to pH and aw using in vitro assays. The obtained data were fitted with the logit model to find the growth/no growth boundary regions of the 2 terpenes, focusing the attention on the type of interaction exerted by citral and linalool. In particular, the results showed an increase of citral antimicrobial effect in growth media characterized by low aw value, as well as a higher linalool antimicrobial effect in media at low pH. Moreover, the interactive effects of the 2 terpenes were exploited. The results obtained with the model were validated in an independent experiment. The knowledge of the interactions of essential oil molecules with enhanced antimicrobial activity, in relation to some of the most important chemicophysical variables, can have important industrial applications, since these substances are able to assure the desired antimicrobial effect without negatively modifying the product flavor profile. The effects of the main chemicophysical parameters (such as aw and pH) on the antimicrobial activity of bioactive terpenes are necessary for the definition of an industrially applicable preservation strategy based on the use of essential oils as natural antimicrobials aimed to prolong shelf life of food products. © 2014 Institute of Food Technologists®
Thuret, Gilles; Brouillet, Eric; Gain, Philippe
The French competitive internship examination for medical students was replaced in 2004 by a national-ranking exam. The profile of students successful at the internship exam, although more or less stereotypical, was never methodically analyzed. To determine the factors that influenced success in the internship exam. Records of 242 students from four successive entering classes (1989-1992) at the Saint-Etienne Faculty of Medicine underwent logistical regression analysis. Factors studied were: gender; occupation of the head of household; details about their General Certificate of Education test (age attest, core subject, overall grade); repeating any first-cycle medical studies (PCEM1); PCEM1 grade; number of certificates resat at the end-of-year make-up session or the following year during second-cycle medical studies (DCEM), known as "debts"; and the ranking obtained in the five faculty certificate tests deemed representative of the overall competitive test and on the clinical and therapeutic summary certificate (CSCT). The 95 students (59%) who did not take the internship exam differed from those who did. The former typically: (1) were older when they began studying medicine (p = 0.003); (2) had passed the General Certificate of Education with the lowest or next lowest pass levels (p = 0.023); (3) re-did their PCEM1 studies (p = 0.001); (4) were ranked in the lower two-thirds of their PCEM1 class (p = 0.001); (5) had more PCEM1 certificates to resit during DCEM (p = 0.007); (6) were ranked in the lower two-thirds of their class for the certificates (including the CSCT) (p age (adjusted odds ratio (OR) 7.79, confidence interval (CI) 95% [1.59-38.13]); (2) had at most one "debt" (adjusted OR 16.51 [1.88-144.78); (3) ranked in the top third of their class for certificates (including the CSCT) (adjusted OR 7.84 [2.42-25.41]). Our study objectively confirmed that consistent work and effective learning strategies, acquired early during education and applied throughout
Full Text Available Background: This study evaluated the role of family history of cancer and gynecologic factors in relation to the etiology of ovarian cancer in a low socioeconomic population in Iran. Methods: From 2007-2009 we conducted a screening program on women with insurance coverage provided by the Imam Khomeini Relief Foundation. A total of 26788 women participated in this study of whom 76 cases had ovarian cancer and 26712 were considered as controls. We used rare event logistic (ReLogit regression analysis with a prior correction method that used the Zelig package in R to obtain odds ratio estimates and confidence intervals. Results: Ovarian cancer was more frequent among postmenopausal than premenopausal (odds ratio: 2.30; confidence interval: 1.17-4.49 women. We observed increased risk for this disease in women with histories of hormone replacement therapy compared to those with no history (odds ratio: 2.36; confidence interval: 1.13-4.91. A greater increase in ovarian cancer was observed in women with family histories of breast (odds ratio: 2.88; confidence interval: 1.44-5.77, ovarian (odds ratio: 11.27; confidence interval: 5.63-22.54 and all cancer sites (odds ratio: 2.95; confidence interval: 1.71-5.08. However, the use of oral contraceptive pills was significantly associated with lower risk for ovarian cancer (odds ratio: 0.47; confidence interval: 0.28-0.79. There was no association between ovarian cancer and age, marital status, occupation, education level, age at menarche, age at first pregnancy and number of pregnancies. Conclusion: Ovarian cancer was considered a rare event. Thus we deemed it necessary to explore the associated risk factors using ReLogit with a prior correction method. The risk factors for ovarian cancer were menopause, history of hormone replacement therapy and family history of cancer of the breast, ovaries and other sites. Oral use of contraceptive pills showed a protective effect on risk for ovarian cancer.
Pian, Wenjing; Khoo, Christopher Sg; Chi, Jianxing
Users searching for health information on the Internet may be searching for their own health issue, searching for someone else's health issue, or browsing with no particular health issue in mind. Previous research has found that these three categories of users focus on different types of health information. However, most health information websites provide static content for all users. If the three types of user health information need contexts can be identified by the Web application, the search results or information offered to the user can be customized to increase its relevance or usefulness to the user. The aim of this study was to investigate the possibility of identifying the three user health information contexts (searching for self, searching for others, or browsing with no particular health issue in mind) using just hyperlink clicking behavior; using eye-tracking information; and using a combination of eye-tracking, demographic, and urgency information. Predictive models are developed using multinomial logistic regression. A total of 74 participants (39 females and 35 males) who were mainly staff and students of a university were asked to browse a health discussion forum, Healthboards.com. An eye tracker recorded their examining (eye fixation) and skimming (quick eye movement) behaviors on 2 types of screens: summary result screen displaying a list of post headers, and detailed post screen. The following three types of predictive models were developed using logistic regression analysis: model 1 used only the time spent in scanning the summary result screen and reading the detailed post screen, which can be determined from the user's mouse clicks; model 2 used the examining and skimming durations on each screen, recorded by an eye tracker; and model 3 added user demographic and urgency information to model 2. An analysis of variance (ANOVA) analysis found that users' browsing durations were significantly different for the three health information contexts
Prodanova, K.; Pashkunova, S.
2010 Mathematics Subject Classification: 62P10. A prospective study of the relationship between some clinical parameters, genetic markers and complications of the patients with diabetes is considered. About 200 patients (male and female) have been examined. The patients are classified into five groups subject to the type of the diabetes. Data obtained for each patient are related to the type of the complications -- macro vascular, retina pathology, neuron pathology and nephrite pathology, ...
Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...
Information and Communication Technology (ICT) is fast changing the face and tempo of the banking industry in Nigeria due to the adoption of electronic banking (e-banking). Consequently, most banks, in recent years have committed substantial investment into the development of ICT. This study examined the adoption of ...
Mielniczuk, Jan; Teisseyre, Paweł
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
Le, Huy; Marcus, Justin
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
Cooley, R.L.; Naff, R.L.
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Harwood, R. Corban
This article addresses the logistics of implementing projects in an undergraduate mathematics class and is intended both for new instructors and for instructors who have had negative experiences implementing projects in the past. Project implementation is given for both lower- and upper-division mathematics courses with an emphasis on mathematical…
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Bolfarine, Heleno; Bazan, Jorge Luis
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
Cizek, Pavel; Sadikoglu, Serhan
In this paper, an extension of the indirect inference methodology to semiparametric estimation is explored in the context of censored regression. Motivated by weak small-sample performance of the censored regression quantile estimator proposed by Powell (J Econom 32:143–155, 1986a), two- and
Das, Iswar; Sahoo, Sashikant; van Westen, Cees; Stein, Alfred; Hack, Robert
Landslide studies are commonly guided by ground knowledge and field measurements of rock strength and slope failure criteria. With increasing sophistication of GIS-based statistical methods, however, landslide susceptibility studies benefit from the integration of data collected from various sources and methods at different scales. This study presents a logistic regression method for landslide susceptibility mapping and verifies the result by comparing it with the geotechnical-based slope stability probability classification (SSPC) methodology. The study was carried out in a landslide-prone national highway road section in the northern Himalayas, India. Logistic regression model performance was assessed by the receiver operator characteristics (ROC) curve, showing an area under the curve equal to 0.83. Field validation of the SSPC results showed a correspondence of 72% between the high and very high susceptibility classes with present landslide occurrences. A spatial comparison of the two susceptibility maps revealed the significance of the geotechnical-based SSPC method as 90% of the area classified as high and very high susceptible zones by the logistic regression method corresponds to the high and very high class in the SSPC method. On the other hand, only 34% of the area classified as high and very high by the SSPC method falls in the high and very high classes of the logistic regression method. The underestimation by the logistic regression method can be attributed to the generalisation made by the statistical methods, so that a number of slopes existing in critical equilibrium condition might not be classified as high or very high susceptible zones.
Muhammad Izman Herdiansyah
Full Text Available The role of logistics nowadays is expanding from just providing transportation and warehousing to offering total integrated logistics. To remain competitive in the global market environment, business enterprises need to improve their logistics operations performance. The improvement will be achieved when we can provide a comprehensive analysis and optimize its network performances. In this paper, a mixed integer linier model for optimizing logistics network performance is developed. It provides a single-product multi-period multi-facilities model, as well as the multi-product concept. The problem is modeled in form of a network flow problem with the main objective to minimize total logistics cost. The problem can be solved using commercial linear programming package like CPLEX or LINDO. Even in small case, the solver in Excel may also be used to solve such model.Keywords: logistics network, integrated model, mathematical programming, network optimization
Lei, Yang; Nollen, Nikki; Ahluwahlia, Jasjit S; Yu, Qing; Mayo, Matthew S
Other forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers. The data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users. The logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification
MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.
Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Sun Mi Kim
Full Text Available Purpose The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD into the image analysis in order to improve the diagnosis of breast cancer. Methods This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS lexicon. We applied and compared two regression methods-stepwise logistic (SL regression and logistic least absolute shrinkage and selection operator (LASSO regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC of the tests. Results Logistic LASSO regression was superior (P<0.05 to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD. However, it was inferior (P<0.05 to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD and the AUC without CDD (0.785 vs. 0.844, P<0.001, but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141. Conclusion Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe
To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.
Full Text Available Background: The human factors are of great importance, especially Motorcycle Rider BehaviorQuestionnaire (MRBQ and attention deficit hyperactivity disorder (ADHD in motorbike riders in road traffic injuries. This study aimed to predict MRBQ score by ADHD score and the underlying predictors by the logistic quantile regression (LQR, as a new strategy.Methods: In this cross-sectional study, 311 motorbike riders were randomly sampled by a clustering method in Bukan, northwest of Iran. The data were collected by MRBQ and ADHDstandard surveys. To assess the relationship at all levels of MRBQ distribution, LQR in 5th, 25th,50th, 75th and 95th quantiles of MRBQ score was utilized to assess the predictability of ADHDscore and its subscales in addition to the underlying predictors of MRBQ score. To do this, an unadjusted and as well as adjusted 4-step hierarchical modeling was used.Results: Almost in all quantiles of MRBQ scores, direct and significant relationships were observed between MRBQ score and ADHD score and its subscales (coefficients: 0.02 to 0.10, all P < 0.05. Besides, the driving period (coefficients: -0.58 to -0.95, P < 0.05 and hour driving(coefficients: 0.42 to 0.52, P < 0.05 also came to be the significant predictors of MRBQ score.Conclusion: ADHD score and driving parameters can be taken into the consideration when planning actions on the motorcycle rider behaviors at all levels of the MRBQ.
Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies
Mao, Hui-Fen; Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni; Wang, Jye
Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients' functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan's community-based occupational therapy (OT) service referral based on experts' beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
This paper deals with the derivation of logistic models for cattle, sheep and goats in a commercial ranching system in Machakos District, Kenya, a savannah ecosystem with average annual rainfall of 589.3 ± 159.3mm and an area of 10 117ha. It involves modelling livestock population dynamics as discrete-time logistic ...
Full Text Available With the increasing demand for customized logistics services in the manufacturing industry, the key factor in realizing the competitiveness of a logistics service supply chain (LSSC is whether it can meet specific requirements with the cost of mass service. In this case, in-depth research on the time-scheduling of LSSC is required. Setting the total cost, completion time, and the satisfaction of functional logistics service providers (FLSPs as optimal targets, this paper establishes a time scheduling model of LSSC, which is constrained by the service order time requirement. Numerical analysis is conducted by using Matlab 7.0 software. The effects of the relationship cost coefficient and the time delay coefficient on the comprehensive performance of LSSC are discussed. The results demonstrate that with the time scheduling model in mass-customized logistics services (MCLSs environment, the logistics service integrator (LSI can complete the order earlier or later than scheduled. With the increase of the relationship cost coefficient and the time delay coefficient, the comprehensive performance of LSSC also increases and tends towards stability. In addition, the time delay coefficient has a better effect in increasing the LSSC’s comprehensive performance than the relationship cost coefficient does.
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Durrleman, S; Simon, R
We describe the use of cubic splines in regression models to represent the relationship between the response variable and a vector of covariates. This simple method can help prevent the problems that result from inappropriate linearity assumptions. We compare restricted cubic spline regression to non-parametric procedures for characterizing the relationship between age and survival in the Stanford Heart Transplant data. We also provide an illustrative example in cancer therapeutics.
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P machine learning models for characterizing solid breast masses on ultrasound.
Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria
We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al.  and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic dro...
Fawzi, Alhussein; Fiot, Jean-Baptiste; Chen, Bei; Sinn, Mathieu; Frossard, Pascal
Additive models are regression methods which model the response variable as the sum of univariate transfer functions of the input variables. Key benefits of additive models are their accuracy and interpretability on many real-world tasks. Additive models are however not adapted to problems involving a large number (e.g., hundreds) of input variables, as they are prone to overfitting in addition to losing interpretability. In this paper, we introduce a novel framework for applying additive ...
Manfredini, Daniele; Perinetti, Giuseppe; Guarda-Nardini, Luca
To assess the association of several dental malocclusion features with temporomandibular joint (TMJ) click sounds in a population of temporomandibular disorder (TMD) patients. Four hundred forty-two TMD patients (72% female; 32.2 ± 5.7 years, range 25-44 years) were divided into a TMJ clicking and a no-TMJ clicking group, based on the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) assessment. Seven occlusal features were recorded for each patient: (1) posterior crossbite, (2) overbite, (3) open bite, (4) overjet, (5) mediotrusive and (6) laterotrusive interferences and (7) retruded contact position to maximum intercuspation (RCP-MI) slide length. A logistic regression model was created to estimate the association of occlusal features with TMJ clicking. The difference between the groups as for the prevalence of the various occlusal features was generally not statistically significant, with minor exceptions. Mediotrusive interferences (P = .015) and RCP-MI slide ≥2 mm (P = .001) were the two occlusal features that were associated with the probability of having TMJ clicking, even if the adjusted odds ratios for TMJ clicking were low for both variables (1.63 and 1.89, respectively). Moreover, the amount of variance in the prevalence of TMJ clicking that was predicted by the final model was as low as 4.5% (R(2) = 0.045). Findings from the present investigation suggested that in a population of TMD patients, the contribution of dental malocclusion features to predict TMJ click sounds is minimal with no clinical relevance.
Full Text Available Introduction & Objective: Asthma is a chronic inflammatory airway disease. Asthma affects one in 13 school age children and is a leading cause of school absenteeism. It seems that prevalence of asthma is increasing wordwide. Many factors are identified and reported as factors related to asthma. This study was carried out to determine the prevalence of asthma and associated factors in 600 children under six years using logistic regression and probit. Materials & Methods: This cross-sectional study was conducted on 600 children under six years old. Questionnaire was constructed based on ISSAC questionnaire and its reliability was determined with a pilot study and calculated by the Cronbach's alpha equal to 69 percent. Cluster sampling based on household records as clusters was performed. Questionnaires were completed by trained staff under supervision of an expert person and by interviewing parents and children. Results: The prevalence of asthma was estimated to be 3.10 (7.89 to 12.78 percent. Based on fitting models to data, factors such as gender, maternal nutrition, exclusive breast feeding to 6 months, smoking at home by a family member and having a history of respiratory allergy in families were significantly associated with asthma prevalence (p-value ≤ 0.05. The results also demonstrated that the both models are almost identical in evaluating the data. Conclusion: This study showed that estimated asthma prevalence is equal to average prevalence reported in Iran. Protective factors, such as exclusive breast feeding as a strategy can be appropriated in children's health care programs and should be much more considered.
Chen, Hui-Fang; Jin, Kuan-Yu; Wang, Wen-Chung
Extreme response styles (ERS) is prevalent in Likert- or rating-type data but previous research has not well-addressed their impact on differential item functioning (DIF) assessments. This study aimed to fill in the knowledge gap and examined their influence on the performances of logistic regression (LR) approaches in DIF detections, including the ordinal logistic regression (OLR) and the logistic discriminant functional analysis (LDFA). Results indicated that both the standard OLR and LDFA yielded severely inflated false positive rates as the magnitude of the differences in ERS increased between two groups. This study proposed a class of modified LR approaches to eliminating the ERS effect on DIF assessment. These proposed modifications showed satisfactory control of false positive rates when no DIF items existed and yielded a better control of false positive rates and more accurate true positive rates under DIF conditions than the conventional LR approaches did. In conclusion, the proposed modifications are recommended in survey research when there are multiple group or cultural groups.
Ellis, J.L.; Ysebaert, T.; Hume, T.; Norkko, A.; Bult, T.P.; Herman, P.; Thrush, S.; Oldman, J.
There is a growing need to predict ecological responses to long-term habitat change. However, statistical models for marine soft-substratum ecosystems are limited, and consequently there is a need for the development of such models. In order to assess the utility of statistical modelling approaches
Ellis, J.; Ysebaert, T.; Hume, T.; Norkko, A.; Bult, T.; Herman, P.M.J.; Thrush, S.; Oldman, J.
There is a growing need to predict ecological responses to long-term habitat change. However, statistical models for marine soft-substratum ecosystems are limited, and consequently there is a need for the development of such models. In order to assess the utility of statistical modelling approaches
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Dieu Tien Bui
Full Text Available The Cat Ba National Park area (Vietnam with its tropical forest is recognized as being part of the world biodiversity conservation by the United Nations Educational, Scientific and Cultural Organization (UNESCO and is a well-known destination for tourists, with around 500,000 travelers per year. This area has been the site for many research projects; however, no project has been carried out for forest fire susceptibility assessment. Thus, protection of the forest including fire prevention is one of the main concerns of the local authorities. This work aims to produce a tropical forest fire susceptibility map for the Cat Ba National Park area, which may be helpful for the local authorities in forest fire protection management. To obtain this purpose, first, historical forest fires and related factors were collected from various sources to construct a GIS database. Then, a forest fire susceptibility model was developed using Kernel logistic regression. The quality of the model was assessed using the Receiver Operating Characteristic (ROC curve, area under the ROC curve (AUC, and five statistical evaluation measures. The usability of the resulting model is further compared with a benchmark model, the support vector machine (SVM. The results show that the Kernel logistic regression model has a high level of performance in both the training and validation dataset, with a prediction capability of 92.2%. Since the Kernel logistic regression model outperforms the benchmark model, we conclude that the proposed model is a promising alternative tool that should also be considered for forest fire susceptibility mapping in other areas. The results of this study are useful for the local authorities in forest planning and management.
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...... in special cases. We investigate the finite sample properties of the estimators and conclude that the asymptotic results are valid for even samll samples....
Ugurlu, Celal Teyyar
This study aims to analyze the administration perception of the teachers according to values in line with certain parameters. The model of the research is relational screening model. The population is applied to 470 teachers who work in 25 secondary schools at the center of Sivas with scales. 317 questionnaires which had been returned have been…
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses. Copyright © 2013 John Wiley & Sons, Ltd.
Kanbayashi, Yuko; Ishikawa, Takeshi; Kanazawa, Motohiro; Nakajima, Yuki; Kawano, Rumi; Tabuchi, Yusuke; Yoshioka, Tomoko; Ihara, Norihiko; Hosokawa, Toyoshi; Takayama, Koichi; Shikata, Keisuke; Taguchi, Tetsuya
Although pegfilgrastim prophylaxis is expected to maintain the relative dose intensity (RDI) of chemotherapy and improve safety, information is limited. However, the optimal selection of patients eligible for pegfilgrastim prophylaxis is an important issue from a medical economics viewpoint. Therefore, this retrospective study identified factors that could predict these eligible patients to maintain the RDI. The participants included 166 cancer patients undergoing pegfilgrastim prophylaxis combined with chemotherapy in our outpatient chemotherapy center between March 2015 and April 2017. Variables were extracted from clinical records for regression analysis of factors related to maintenance of the RDI. RDI was classified into four categories: 100% = 0, 85% or predictive factors in patients eligible for pegfilgrastim prophylaxis to maintain the RDI. Threshold measures were examined using a receiver operating characteristic (ROC) analysis curve. Age [odds ratio (OR) 1.07, 95% confidence interval (CI) 1.04-1.11; P maintenance. ROC curve analysis of the group that failed to maintain the RDI indicated that the threshold for age was 70 years and above, with a sensitivity of 60.0% and specificity of 80.2% (area under the curve: 0.74). In conclusion, younger age, anemia (less), and administration of pegfilgrastim 24-72 h after chemotherapy were significant factors for RDI maintenance.
Berlin, Conny; Blanch, Carles; Lewis, David J; Maladorno, Dionigi D; Michel, Christiane; Petrin, Michael; Sarp, Severine; Close, Philippe
The detection of safety signals with medicines is an essential activity to protect public health. Despite widespread acceptance, it is unclear whether recently applied statistical algorithms provide enhanced performance characteristics when compared with traditional systems. Novartis has adopted a novel system for automated signal detection on the basis of disproportionality methods within a safety data mining application (Empirica™ Signal System [ESS]). ESS uses two algorithms for routine analyses: empirical Bayes Multi-item Gamma Poisson Shrinker and logistic regression (LR). A model was developed comprising 14 medicines, categorized as "new" or "established." A standard was prepared on the basis of safety findings selected from traditional sources. ESS results were compared with the standard to calculate the positive predictive value (PPV), specificity, and sensitivity. PPVs of the lower one-sided 5% and 0.05% confidence limits of the Bayes geometric mean (EB05) and of the LR odds ratio (LR0005) almost coincided for all the drug-event combinations studied. There was no obvious difference comparing the PPV of the leading Medical Dictionary for Regulatory Activities (MedDRA) terms to the PPV for all terms. The PPV of narrow MedDRA query searches was higher than that for broad searches. The widely used threshold value of EB05 = 2.0 or LR0005 = 2.0 together with more than three spontaneous reports of the drug-event combination produced balanced results for PPV, sensitivity, and specificity. Consequently, performance characteristics were best for leading terms with narrow MedDRA query searches irrespective of applying Multi-item Gamma Poisson Shrinker or LR at a threshold value of 2.0. This research formed the basis for the configuration of ESS for signal detection at Novartis. Copyright © 2011 John Wiley & Sons, Ltd.
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
Smith, Kelly; Gay, Robert; Stachowiak, Susan
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Heylen, Kris; Geeraerts, Dirk
When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...
Pajouheshnia, R.; Pestman, W.R.; Teerenstra, S.; Groenwold, R.H.
BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in
Liu, Min; Lin, Tsung-I
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
The objective is to minimize the processing time and computer memory required .... Survey. 65 time to acquire extra GPR or seismic data for large sites and picking the first arrival time. 66 to provide the needed datasets for the joint inversion are also .... The data utilized for the regression modelling was acquired from ground.
Pape Sarah A
Full Text Available Abstract Background Laser-Doppler imaging (LDI of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA, site and cause of burn, day of LDI scan, burn center. It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are
Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio
The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine
Alexandrescu, Roxana; Jen, Min-Hua; Bottle, Alex; Jarman, Brian; Aylin, Paul
Although logistic regression is traditionally used to calculate hospital standardized mortality ratio (HSMR), it ignores the hierarchical structure of the data that can exist within a given database. Hierarchical models allow examination of the effect of data clustering on outcomes. Traditional logistic regression and random intercepts fixed slopes hierarchical models were fitted to a dataset of patients hospitalized between 2005 and 2007 in Massachusetts. We compared the observed to expected (O/E) in-hospital death ratios between the 2 modeling techniques, a restricted HSMR using only those diagnosis models that converged in both methods and a full hybrid HSMR using a combination of the hierarchical diagnosis models when they converge, plus the remaining diagnoses using standard logistic regression models. We restricted the analysis to the 36 diagnoses accounting for 80% of in-hospital deaths nationally, based on 1,043,813 admissions (59 hospitals). A failure of the hierarchical models to converge in 15 of 36 diagnosis groups hindered full HSMR comparisons. A restricted HSMR, derived from a dataset based on the 21 diagnosis groups that converged (552,933 admissions) showed very high correlation (Pearson r = 0.99). Both traditional logistic regression and hierarchical model identified 12 statistical outliers in common, 7 with high O/E values and 5 with low O/E values. In addition, the multilevel analysis identified 5 additional unique high outliers and 1 additional unique low outlier, and the conventional model identified 2 additional unique low outliers. Similar results were obtained from the 2 modeling techniques in terms of O/E ratios. However, because a hierarchical model is associated with convergence problems, traditional logistic regression remains our recommended procedure for computing HSMRs. Copyright © 2011 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Managan, Julie E
The Compact Muon Solenoid Experiment (CMS) at the Large Hadron Collider (LHC) at CERN has brilliant prospects for uncovering new information about the physical structure of our universe. Soon physicists around the world will participate together in analyzing CMS data in search of new physics phenomena and the Higgs Boson. However, they face a significant problem: with 5 Petabytes of data needing distribution each year, how will physicists get the data they need? How and where will they be able to analyze it? Computing resources and scientists are scattered around the world, while CMS data exists in localized chunks. The CMS computing model only allows analysis of locally stored data, “tethering” analysis to storage. The Vanderbilt CMS team is actively working to solve this problem with the Research and Education Data Depot Network (REDDnet), a program run by Vanderbilt’s Advanced Computing Center for Research and Education (ACCRE). The Compact Muon Solenoid Experiment (CMS) at the Large Hadron Collider ...
JELO is an Excel spreadsheet model of joint expeditionary logistics operations and allows end-to-end analysis of the options for closing forces from CONUS, through the sea base, to objectives ashore...
Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M
Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.
Li, Saijiao; He, Aiyan; Yang, Jing; Yin, TaiLang; Xu, Wangming
To investigate factors that can affect compliance with treatment of polycystic ovary syndrome (PCOS) in infertile patients and to provide a basis for clinical treatment, specialist consultation and health education. Patient compliance was assessed via a questionnaire based on the Morisky-Green test and the treatment principles of PCOS. Then interviews were conducted with 99 infertile patients diagnosed with PCOS at Renmin Hospital of Wuhan University in China, from March to September 2009. Finally, these data were analyzed using logistic regression analysis. Logistic regression analysis revealed that a total of 23 (25.6%) of the participants showed good compliance. Factors that significantly (p < 0.05) affected compliance with treatment were the patient's body mass index, convenience of medical treatment and concerns about adverse drug reactions. Patients who are obese, experience inconvenient medical treatment or are concerned about adverse drug reactions are more likely to exhibit noncompliance. Treatment education and intervention aimed at these patients should be strengthened in the clinic to improve treatment compliance. Further research is needed to better elucidate the compliance behavior of patients with PCOS.
Full Text Available Logistics is an interdisciplinary field of study. Modern logisticians need to integrate business management and administration skills with technology design, IT systems and other engineering fields. However, based on research of university curricula and competence standards in logistics, the engineering aspect is not represented to full potential. There are some treatments of logistician competences which relate to engineering, but not a modernized one with wide-spread recognition. This paper aims to explain the situation from the conceptual development point of view and suggests a competence profile for “logistics system engineer”, which introduces the viewpoint of systems engineering into logistics. For that purpose, the paper analyses requirements of various topical competence models and merges the introductory competences of systems engineering into logistics. In current interpretation, logistics systems engineering view integrates networks, technologies and ICT, process and service design and offers broader interdisciplinary approach. Another term suitable for this field would be intelligent logistics. The practical implication of such a competence profile is to utilize it in curriculum development and also present it as an occupational standard. The academic relevance of such concept is to offer a specific way to differentiate education in logistics.
Full Text Available This paper describes the structure of the logistic maturity model (LMM in detail and shows the possible improvements that can be achieved by using this model in terms of the identification of the most appropriate actions to be taken in order to increase the performance of the logistics processes in industrial companies. The paper also gives an example of the LMM’s application to a famous Italian female fashion firm, which decided to use the model as a guideline for the optimization of its supply chain. Relying on a 5-level maturity staircase, specific achievement indicators as well as key performance indicators and best practices are defined and related to each logistics area/process/sub-process, allowing any user to easily and rapidly understand the more critical logistical issues in terms of process immaturity.
Slamet, I.; Nugroho, N. F. T. A.; Muslich
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Gakis, Konstantinos; Pardalos, Panos
Focused on the logistics and transportation operations within a supply chain, this book brings together the latest models, algorithms, and optimization possibilities. Logistics and transportation problems are examined within a sustainability perspective to offer a comprehensive assessment of environmental, social, ethical, and economic performance measures. Featured models, techniques, and algorithms may be used to construct policies on alternative transportation modes and technologies, green logistics, and incentives by the incorporation of environmental, economic, and social measures. Researchers, professionals, and graduate students in urban regional planning, logistics, transport systems, optimization, supply chain management, business administration, information science, mathematics, and industrial and systems engineering will find the real life and interdisciplinary issues presented in this book informative and useful.
... Laboratory, Logistics Research Division, Logistics Readiness Branch to propose a research agenda entitled, "Models, Web-based Simulations, and Integrated Analysis Techniques for Improved Logistical Performance...
Knafl, George J
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible