WorldWideScience

Sample records for models logistic regression

  1. Logistic regression.

    Science.gov (United States)

    Nick, Todd G; Campbell, Kathleen M

    2007-01-01

    The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.

  2. An Application on Multinomial Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Abdalla M El-Habil

    2012-03-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.

  3. Logistic Regression Model on Antenna Control Unit Autotracking Mode

    Science.gov (United States)

    2015-10-20

    412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of

  4. Combining logistic regression and neural networks to create predictive models.

    OpenAIRE

    Spackman, K. A.

    1992-01-01

    Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...

  5. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  6. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  7. SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE

    OpenAIRE

    RODRIGO PINTO MOREIRA

    2008-01-01

    Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...

  8. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  9. Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression

    Directory of Open Access Journals (Sweden)

    Han Lu

    2013-01-01

    Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.

  10. Logistic regression for risk factor modelling in stuttering research.

    Science.gov (United States)

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  12. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  13. On modified skew logistic regression model and its applications

    Directory of Open Access Journals (Sweden)

    C. Satheesh Kumar

    2015-12-01

    Full Text Available Here we consider a modified form of the logistic regression model useful for situations where the dependent variable is dichotomous in nature and the explanatory variables exhibit asymmetric and multimodal behaviour. The proposed model has been fitted to some real life data set by using method of maximum likelihood estimation and illustrated its usefulness in certain medical applications.

  14. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    Science.gov (United States)

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.

    2017-03-01

    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  15. APPLYING LOGISTIC REGRESSION MODEL TO THE EXAMINATION RESULTS DATA

    Directory of Open Access Journals (Sweden)

    Goutam Saha

    2011-01-01

    Full Text Available The binary logistic regression model is used to analyze the school examination results(scores of 1002 students. The analysis is performed on the basis of the independent variables viz.gender, medium of instruction, type of schools, category of schools, board of examinations andlocation of schools, where scores or marks are assumed to be dependent variables. The odds ratioanalysis compares the scores obtained in two examinations viz. matriculation and highersecondary.

  16. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  17. [Understanding logistic regression].

    Science.gov (United States)

    El Sanharawi, M; Naudet, F

    2013-10-01

    Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.

  18. Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City

    Directory of Open Access Journals (Sweden)

    Amiruddin Ismail

    2011-01-01

    Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.

  19. Practical Session: Logistic Regression

    Science.gov (United States)

    Clausel, M.; Grégoire, G.

    2014-12-01

    An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.

  20. Fungible weights in logistic regression.

    Science.gov (United States)

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record

  1. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...

  2. Logistic regression models of factors influencing the location of bioenergy and biofuels plants

    Science.gov (United States)

    T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu

    2011-01-01

    Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...

  3. A hybrid model using logistic regression and wavelet transformation to detect traffic incidents

    Directory of Open Access Journals (Sweden)

    Shaurya Agarwal

    2016-07-01

    Full Text Available This research paper investigates a hybrid model using logistic regression with a wavelet-based feature extraction for detecting traffic incidents. A logistic regression model is suitable when the outcome can take only a limited number of values. For traffic incident detection, the outcome is limited to only two values, the presence or absence of an incident. The logistic regression model used in this study is a generalized linear model (GLM with a binomial response and a logit link function. This paper presents a framework to use logistic regression and wavelet-based feature extraction for traffic incident detection. It investigates the effect of preprocessing data on the performance of incident detection models. Results of this study indicate that logistic regression along with wavelet based feature extraction can be used effectively for incident detection by balancing the incident detection rate and the false alarm rate according to need. Logistic regression on raw data resulted in a maximum detection rate of 95.4% at the cost of 14.5% false alarm rate. Whereas the hybrid model achieved a maximum detection rate of 98.78% at the expense of 6.5% false alarm rate. Results indicate that the proposed approach is practical and efficient; with future improvements in the proposed technique, it will make an effective tool for traffic incident detection.

  4. Using the Logistic Regression model in supporting decisions of establishing marketing strategies

    Directory of Open Access Journals (Sweden)

    Cristinel CONSTANTIN

    2015-12-01

    Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations

  5. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    Science.gov (United States)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  6. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  7. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    Science.gov (United States)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  8. Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach

    Directory of Open Access Journals (Sweden)

    Wun Wong

    2003-01-01

    Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.

  9. Fitting multistate transition models with autoregressive logistic regression : Supervised exercise in intermittent claudication

    NARCIS (Netherlands)

    de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M

    1998-01-01

    The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six

  10. Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression

    Directory of Open Access Journals (Sweden)

    Li Jian

    2017-01-01

    Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.

  11. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  12. Logistic regression: a brief primer.

    Science.gov (United States)

    Stoltzfus, Jill C

    2011-10-01

    Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model

  13. MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES

    Directory of Open Access Journals (Sweden)

    Parameshwar V. Pandit

    2012-06-01

    Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.

  14. Logistic回归模型及其应用%Logistic regression model and its application

    Institute of Scientific and Technical Information of China (English)

    常振海; 刘薇

    2012-01-01

    为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.

  15. Predictive market segmentation model: An application of logistic regression model and CHAID procedure

    Directory of Open Access Journals (Sweden)

    Soldić-Aleksić Jasna

    2009-01-01

    Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.

  16. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    Science.gov (United States)

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  17. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Science.gov (United States)

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  18. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    Science.gov (United States)

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  19. Logistic regression for circular data

    Science.gov (United States)

    Al-Daffaie, Kadhem; Khan, Shahjahan

    2017-05-01

    This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.

  20. Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy

    Directory of Open Access Journals (Sweden)

    Michel Ducher

    2013-01-01

    Full Text Available Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n=155 performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC curves. IgAN was found (on pathology in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67% and specificity (73% versus 95% using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  1. Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.

    Science.gov (United States)

    Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre

    2013-01-01

    Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  2. Logistic Regression: Concept and Application

    Science.gov (United States)

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  3. A Logistic Regression Model for Predicting Axillary Lymph Node Metastases in Early Breast Carcinoma Patients

    Directory of Open Access Journals (Sweden)

    Jiaqing Zhang

    2012-07-01

    Full Text Available Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010 and Kiss-1 (p = 0.001 expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018. Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy.

  4. A general framework for the use of logistic regression models in meta-analysis.

    Science.gov (United States)

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.

  5. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  6. Should metacognition be measured by logistic regression?

    Science.gov (United States)

    Rausch, Manuel; Zehetleitner, Michael

    2017-03-01

    Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features.

    Directory of Open Access Journals (Sweden)

    Gregor Stiglic

    Full Text Available Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771 to 0.769 (95% CI: 0.761-0.777. Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.

  8. Urban Growth Modelling with Artificial Neural Network and Logistic Regression. Case Study: Sanandaj City, Iran

    Directory of Open Access Journals (Sweden)

    SASSAN MOHAMMADY

    2013-01-01

    Full Text Available Cities have shown remarkable growth due to attraction, economic, social and facilities centralization in the past few decades. Population and urban expansion especially in developing countries, led to lack of resources, land use change from appropriate agricultural land to urban land use and marginalization. Under these circumstances, land use activity is a major issue and challenge for town and country planners. Different approaches have been attempted in urban expansion modelling. Artificial Neural network (ANN models are among knowledge-based models which have been used for urban growth modelling. ANNs are powerful tools that use a machine learning approach to quantify and model complex behaviour and patterns. In this research, ANN and logistic regression have been employed for interpreting urban growth modelling. Our case study is Sanandaj city and we used Landsat TM and ETM+ imageries acquired at 2000 and 2006. The dataset used includes distance to main roads, distance to the residence region, elevation, slope, and distance to green space. Percent Area Match (PAM obtained from modelling of these changes with ANN is equal to 90.47% and the accuracy achieved for urban growth modelling with Logistic Regression (LR is equal to 88.91%. Percent Correct Match (PCM and Figure of Merit for ANN method were 91.33% and 59.07% and then for LR were 90.84% and 57.07%, respectively.

  9. Modeling of geogenic radon in Switzerland based on ordered logistic regression.

    Science.gov (United States)

    Kropat, Georg; Bochud, François; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2017-01-01

    The estimation of the radon hazard of a future construction site should ideally be based on the geogenic radon potential (GRP), since this estimate is free of anthropogenic influences and building characteristics. The goal of this study was to evaluate terrestrial gamma dose rate (TGD), geology, fault lines and topsoil permeability as predictors for the creation of a GRP map based on logistic regression. Soil gas radon measurements (SRC) are more suited for the estimation of GRP than indoor radon measurements (IRC) since the former do not depend on ventilation and heating habits or building characteristics. However, SRC have only been measured at a few locations in Switzerland. In former studies a good correlation between spatial aggregates of IRC and SRC has been observed. That's why we used IRC measurements aggregated on a 10 km × 10 km grid to calibrate an ordered logistic regression model for geogenic radon potential (GRP). As predictors we took into account terrestrial gamma doserate, regrouped geological units, fault line density and the permeability of the soil. The classification success rate of the model results to 56% in case of the inclusion of all 4 predictor variables. Our results suggest that terrestrial gamma doserate and regrouped geological units are more suited to model GRP than fault line density and soil permeability. Ordered logistic regression is a promising tool for the modeling of GRP maps due to its simplicity and fast computation time. Future studies should account for additional variables to improve the modeling of high radon hazard in the Jura Mountains of Switzerland. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  10. Application of Logistic Regression Tree Model in Determining Habitat Distribution of Astragalus verus

    Directory of Open Access Journals (Sweden)

    M. Saki

    2013-03-01

    Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.

  11. Semi-parametric estimation of random effects in a logistic regression model using conditional inference

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2016-01-01

    . For each term in the composite likelihood, a conditional likelihood is used that eliminates the influence of the random effects, which results in a composite conditional likelihood consisting of only one-dimensional integrals that may be solved numerically. Good properties of the resulting estimator......This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...

  12. A logistic regression model of Coronary Artery Disease among Male Patients in Punjab

    Directory of Open Access Journals (Sweden)

    Sohail Chand

    2005-07-01

    Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.

  13. Regional Integrated Meteorological Forecasting and Warning Model for Geological Hazards Based on Logistic Regression

    Institute of Scientific and Technical Information of China (English)

    XU Jing; YANG Chi; ZHANG Guoping

    2007-01-01

    Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for modeling the probabilities of geological hazard occurrences, upon which hierarchical warnings for rainfall-induced geological hazards are produced. The forecasting and warning model takes numerical precipitation forecasts on grid points as its dynamic input, forecasts the probabilities of geological hazard occurrences on the same grid, and translates the results into likelihoods in the form of a 5-level hierarchy. Validation of the model with observational data for the year 2004 shows that 80% of the geological hazards of the year have been identified as "likely enough to release warning messages". The model can satisfy the requirements of an operational warning system, thus is an effective way to improve the meteorological warnings for geological hazards.

  14. Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Jassim N. Hussain

    2008-01-01

    Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.

  15. elrm: Software Implementing Exact-Like Inference for Logistic Regression Models

    Directory of Open Access Journals (Sweden)

    David Zamar

    2007-09-01

    Full Text Available Exact inference is based on the conditional distribution of the sufficient statistics for the parameters of interest given the observed values for the remaining sufficient statistics. Exact inference for logistic regression can be problematic when data sets are large and the support of the conditional distribution cannot be represented in memory. Additionally, these methods are not widely implemented except in commercial software packages such as LogXact and SAS. Therefore, we have developed elrm, software for R implementing (approximate exact inference for binomial regression models from large data sets. We provide a description of the underlying statistical methods and illustrate the use of elrm with examples. We also evaluate elrm by comparing results with those obtained using other methods.

  16. Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI

    Energy Technology Data Exchange (ETDEWEB)

    Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)

    2014-09-17

    We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)

  17. Spatial modelling of periglacial phenomena in Deception Island (Maritime Antarctic): logistic regression and informative value method.

    Science.gov (United States)

    Melo, Raquel; Vieira, Gonçalo; Caselli, Alberto; Ramos, Miguel

    2010-05-01

    Field surveying during the austral summer of 2007/08 and the analysis of a QuickBird satellite image, resulted on the production of a detailed geomorphological map of the Irizar and Crater Lake area in Deception Island (South Shetlands, Maritime Antarctic - 1:10 000) and allowed its analysis and spatial modelling of the geomorphological phenomena. The present study focus on the analysis of the spatial distribution and characteristics of hummocky terrains, lag surfaces and nivation hollows, complemented by GIS spatial modelling intending to identify relevant controlling geographical factors. Models of the susceptibility of occurrence of these phenomena were created using two statistical methods: logistical regression, as a multivariate method; and the informative value as a bivariate method. Success and prediction rate curves were used for model validation. The Area Under the Curve (AUC) was used to quantify the level of performance and prediction of the models and to allow the comparison between the two methods. Regarding the logistic regression method, the AUC showed a success rate of 71% for the lag surfaces, 81% for the hummocky terrains and 78% for the nivation hollows. The prediction rate was 72%, 68% and 71%, respectively. Concerning the informative value method, the success rate was 69% for the lag surfaces, 84% for the hummocky terrains and 78% for the nivation hollows, and with a correspondingly prediction of 71%, 66% and 69%. The results were of very good quality and demonstrate the potential of the models to predict the influence of independent variables in the occurrence of the geomorphological phenomena and also the reliability of the data. Key-words: present-day geomorphological dynamics, detailed geomorphological mapping, GIS, spatial modelling, Deception Island, Antarctic.

  18. A binary logistic regression model for discriminating real protein-protein interface

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    The selection and study of descriptive variables of protein-protein complex interface is a major question that many biologists come across when the research of protein-protein recognition is concerned. Several variables have been proposed to understand the structural or energetic features of complex interfaces. Here a systematic study of some of these "traditional" variables, as well as a few new ones, is introduced. With the values of these variables extracted from 42 PDB samples with real or false complex interfaces, a binary logistic regression analysis is performed, which results in an effective empirical model for the evaluation of binding probabilities of protein-protein interfaces. The model is validated with 12 samples, and satisfactory results are obtained for both the training and validation sets. Meanwhile, three potential dimeric interfaces of staphylokinase have been investigated and one with the best suitability to our model is proposed.

  19. Modeling group size and scalar stress by logistic regression from an archaeological perspective.

    Directory of Open Access Journals (Sweden)

    Gianmarco Alberti

    Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.

  20. A two-stage logistic regression-ANN model for the prediction of distress banks: Evidence from 11 emerging countries

    National Research Council Canada - National Science Library

    Shu Ling Lin

    2010-01-01

      This paper proposes a new approach of two-stage hybrid model of logistic regression-ANN for the construction of a financial distress warning system for banking industry in emerging market during 1998-2006...

  1. AN APPLICATION OF THE LOGISTIC REGRESSION MODEL IN THE EXPERIMENTAL PHYSICAL CHEMISTRY

    Directory of Open Access Journals (Sweden)

    Elpidio Corral-López

    2015-06-01

    Full Text Available The calculation of intensive properties molar volumes of ethanol-water mixtures by experimental densities and tangent method in the Physical Chemistry Laboratory presents the problem of making manually the molar volume curve versus mole fraction and the trace of the tangent line trace. The advantage of using a statistical model the Logistic Regression on a Texas VOYAGE graphing calculator allowed trace the curve and the tangents in situ, and also evaluate the students work during the experimental session. The error percentage between the molar volumes calculated using literature data and those obtained with statistical method is minimal, which validates the model. It is advantageous use the calculator with this application as a teaching support tool, reducing the evaluation time of 3 weeks to 3 hours.

  2. Effective factors contraceptive use by logistic regression model in Tehran, 1996

    Directory of Open Access Journals (Sweden)

    Ramezani F

    1999-07-01

    Full Text Available Despite unwillingness to fertility, about 30% of couples do not use any kind of contraception and this will lead to unwanted pregnancy. In this clinical trial study, 4177 subjects who had at least one alive child, and delivered in one of the 12 university hospitals in Tehran were recruited. This study was conducted in 1996. The questionnaire included some questions about contraceptive use, their attitudes about unwantedness or wantedness of their current pregnancies. Data were analysed using a Logistic Regrassion Model. Results showed that 20.3% of those who had no fertility intention, did not use any kind of contraception methods, 41.1% of the subjects who were using a contraception method before pregnancy, had got pregnant unwantedly. Based on Logistic Regression Model; age, education, previous familiarity of women with contraception methods and husband's education were the most significant factors in contraceptive use. Subjects who were 20 years old and less or 35 years old and more and illeterate subjects were at higher risk for unuse of contraception methods. This risk was not related to the gender of their children that suggests a positive change in their perspectives towards sex and the number of children. It is suggested that health politicians choose an appropriate model to enhance the literacy, education and counseling for the correct usage of contraceptives and prevention of unwanted pregnancy.

  3. Sample size matters: Investigating the optimal sample size for a logistic regression debris flow susceptibility model

    Science.gov (United States)

    Heckmann, Tobias; Gegg, Katharina; Becht, Michael

    2013-04-01

    Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size

  4. Modeling Anthropogenic Fire Occurrence in the Boreal Forest of China Using Logistic Regression and Random Forests

    Directory of Open Access Journals (Sweden)

    Futao Guo

    2016-10-01

    Full Text Available Frequent and intense anthropogenic fires present meaningful challenges to forest management in the boreal forest of China. Understanding the underlying drivers of human-caused fire occurrence is crucial for making effective and scientifically-based forest fire management plans. In this study, we applied logistic regression (LR and Random Forests (RF to identify important biophysical and anthropogenic factors that help to explain the likelihood of anthropogenic fires in the Chinese boreal forest. Results showed that the anthropogenic fires were more likely to occur at areas close to railways and were significantly influenced by forest types. In addition, distance to settlement and distance to road were identified as important predictors for anthropogenic fire occurrence. The model comparison indicated that RF had greater ability than LR to predict forest fires caused by human activity in the Chinese boreal forest. High fire risk zones in the study area were identified based on RF, where we recommend increasing allocation of fire management resources.

  5. A semiparametric Wald statistic for testing logistic regression models based on case-control data

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.

  6. Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict

    Science.gov (United States)

    Ismail, Mohd Tahir; Alias, Siti Nor Shadila

    2014-07-01

    For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..

  7. Statistical modelling for thoracic surgery using a nomogram based on logistic regression.

    Science.gov (United States)

    Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H

    2016-08-01

    A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted.

  8. A simulation study of sample size for multilevel logistic regression models

    Directory of Open Access Journals (Sweden)

    Moineddin Rahim

    2007-07-01

    Full Text Available Abstract Background Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates. Methods In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC is examined. Results The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group. Conclusion We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence

  9. Analysing the forward premium anomaly using a Logistic Smooth Transition Regression model.

    OpenAIRE

    Sofiane Amri

    2008-01-01

    Several researchers have suggested that exchange rates may be characterized by nonlinear behaviour. This paper examines these nonlinearities and asymetries and estimates a Logistic Transition Regression (LSTR) of Fama Regression with the Risk Adjusted Forward Premia as transition variable. Results confirm the existence of nonlinear dynamics in the relationship between spot exchange rate differential and the forward premium for all the currencies of the sample and for all maturities (three and...

  10. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes

    Directory of Open Access Journals (Sweden)

    Steyerberg Ewout W

    2011-05-01

    Full Text Available Abstract Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI enrolled in eight Randomized Controlled Trials (RCTs and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4, Stata (GLLAMM, SAS (GLIMMIX and NLMIXED, MLwiN ([R]IGLS and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC, R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal models for the main study and when based on a relatively large number of level-1 (patient level data compared to the number of level-2 (hospital level data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in

  11. Multiple logistic regression model of signalling practices of drivers on urban highways

    Science.gov (United States)

    Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana

    2015-05-01

    Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.

  12. Fisher Scoring Method for Parameter Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia

    2017-06-01

    GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.

  13. Zone-specific logistic regression models improve classification of prostate cancer on multi-parametric MRI

    Energy Technology Data Exchange (ETDEWEB)

    Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim U.; Emberton, Mark [University College London, Research Department of Urology, Division of Surgery and Interventional Science, London (United Kingdom); Kirkham, Alex [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)

    2015-09-15

    To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangeable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. (orig.)

  14. Satellite rainfall retrieval by logistic regression

    Science.gov (United States)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  15. Predictive occurrence models for coastal wetland plant communities: delineating hydrologic response surfaces with multinomial logistic regression

    Science.gov (United States)

    Snedden, Gregg A.; Steyer, Gregory D.

    2013-01-01

    Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.

  16. Modeling Haze Problems in the North of Thailand using Logistic Regression

    Directory of Open Access Journals (Sweden)

    Busayamas Pimpunchat

    2014-07-01

    Full Text Available At present, air pollution is a major problem in the upper northern region of Thailand. Air pollutants have an effect on human health, the economy and the traveling industry. The severity of this problem clearly appears every year during the dry season, from February to April. In particular it becomes very serious in March, especially in Chiang Mai province where smoke haze is a major issue. This study looked into related data from 2005-2010 covering eight principal parameters: PM10 (particulate matter with a diameter smaller than 10 micrometer, CO (carbon monoxide, NO2 (nitrogen dioxide, SO2 (sulphur dioxide, RH (relative humidity, NO (nitrogen oxide, pressure, and rainfall. Overall haze problem occurrence was calculated from a logistic regression model. Its dependence on the eight parameters stated above was determined for design conditions using the correlation coefficients with PM10. The proposed overall haze problem modeling can be used as a quantitative assessment criterion for supporting decision making to protect human health. This study proposed to predict haze problem occurrence in 2011. The agreement of the results from the mathematical model with actual measured PM10 concentration data from the Pollution Control Department was quite satisfactory.

  17. [Clinical research XX. From clinical judgment to multiple logistic regression model].

    Science.gov (United States)

    Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O

    2014-01-01

    The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.

  18. Flood susceptible analysis at Kelantan river basin using remote sensing and logistic regression model

    Science.gov (United States)

    Pradhan, Biswajeet

    Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding

  19. Evaluation of Inference Adequacy in Cumulative Logistic Regression Models: An Empirical Validation of ISWRidge Relationships

    Institute of Scientific and Technical Information of China (English)

    Cheng-Wu CHEN; Hsien-Chueh Peter YANG; Chen-Yuan CHEN; Alex Kung-Hsiung CHANG; Tsung-Hao CHEN

    2008-01-01

    Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ridge height and potential energy during wave-ridge interaction with a binary and cumulative logistic regression model. In testing the Global Null Hypothesis, all values are p<0.001, with three statistical methods, such as Likelihood Ratio, Score, and Wald. While comparing with two kinds of models, tests values obtained by cumulative logistic regression models are better than those by binary logistic regression models. Although this study employed cumulative logistic regression model, three probability functions 1, 2 and 3, are utilized for investigating the weighted influence of factors on wave reflection. Deviance and Pearson tests are applied to check the goodness-of-fit of the proposed model. The analytical results demonstrated that both ridge height (X1) and potential energy (X2) significantly impact (p<0.0001) the amplitude-based reflected rate; the P-values for the deviance and Pearson are all >0.05 (0.2839, 0.3438, respectively). That is, the goodness-of-fit between ridge height (X1) and potential energy (X2) can further predict parameters under the scenario of the best parsimonious model.Investigation of 6 predictive powers (R2, Max-rescaled R2, Somers'D, Gamma, Tau-a, and c, respectively) indicate that these predictive estimates of the proposed model have better predictive ability than ridge height alone, and are very similar to the interaction of ridge height and potential energy. It can be concluded that the goodness-of-fit and prediction ability of the cumulative logistic regression model are better than that of the binary logistic regression model.

  20. Modeling susceptibility to deforestation of remaining ecosystems in North Central Mexico with logistic regression

    Institute of Scientific and Technical Information of China (English)

    L. Miranda-Aragón; E.J. Trevi(n)o-Garza; J. Jiménez-Pérez; O.A. Aguirre-Calderón; M.A. González-Tagle; M. Pompa-García; C.A. Aguirre-Salado

    2012-01-01

    Determining underlying factors that foster deforestation and delineating forest areas by levels of susceptibility are of the main challenges when defining policies for forest management and planning at regional scale.The susceptibility to deforestation of remaining forest ecosystems (shrubland,temperate forest and rainforest) was conducted in the state of San Luis Potosi,located in north central Mexico.Spatial analysis techniques were used to detect the deforested areas in the study area during 1993-2007.Logistic regression was used to relate explanatory variables (such as social,investment,forest production,biophysical and proximity factors) with susceptibility to deforestation to construct predictive models with two focuses:general and by biogeographical zone.In all models,deforestation has positive correlation with distance to rainfed agriculture,and negative correlation with slope,distance to roads and distance to towns.Other variables were significant in some cases,but in others they had dual relationships,which varied in each biogeographical zone.The results show that the remaining rainforest of Huasteca region is highly susceptible to deforestation.Both approaches show that more than 70% of the current rainforest area has high and very high levels of susceptibility to deforestation.The values represent a serious concern with global warming whether tree carbon is released to atmosphere.However,after some considerations,encouraging forest environmental services appears to be the best alternative to achieve sustainabie forest management.

  1. Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model

    Science.gov (United States)

    Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd

    2016-10-01

    Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.

  2. A semiparametric Wald statistic for testing logistic regression models based on case-control data

    Institute of Scientific and Technical Information of China (English)

    WAN ShuWen

    2008-01-01

    We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data.The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator.The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997,the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001.The statistic is easy to compute in the sense that it requires none of the following methods:using a bootstrap method to find its critical values,partitioning the sample data or inverting a high-dimensional matrix.We present some results on simulation and on analysis of two real examples.Moreover,we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.

  3. The likelihood of achieving quantified road safety targets: a binary logistic regression model for possible factors.

    Science.gov (United States)

    Sze, N N; Wong, S C; Lee, C Y

    2014-12-01

    In past several decades, many countries have set quantified road safety targets to motivate transport authorities to develop systematic road safety strategies and measures and facilitate the achievement of continuous road safety improvement. Studies have been conducted to evaluate the association between the setting of quantified road safety targets and road fatality reduction, in both the short and long run, by comparing road fatalities before and after the implementation of a quantified road safety target. However, not much work has been done to evaluate whether the quantified road safety targets are actually achieved. In this study, we used a binary logistic regression model to examine the factors - including vehicle ownership, fatality rate, and national income, in addition to level of ambition and duration of target - that contribute to a target's success. We analyzed 55 quantified road safety targets set by 29 countries from 1981 to 2009, and the results indicate that targets that are in progress and with lower level of ambitions had a higher likelihood of eventually being achieved. Moreover, possible interaction effects on the association between level of ambition and the likelihood of success are also revealed.

  4. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model.

    Science.gov (United States)

    Wang, Liguo; Park, Hyun Jung; Dasari, Surendra; Wang, Shengqin; Kocher, Jean-Pierre; Li, Wei

    2013-04-01

    Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and 'hidden' transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based software such as Coding-Potential Calculator (sensitivity: 0.99, specificity: 0.74) and Phylo Codon Substitution Frequencies (sensitivity: 0.90, specificity: 0.63). In addition to high accuracy, CPAT is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies, enabling its users to process thousands of transcripts within seconds. The software accepts input sequences in either FASTA- or BED-formatted data files. We also developed a web interface for CPAT that allows users to submit sequences and receive the prediction results almost instantly.

  5. Common pitfalls in statistical analysis: Logistic regression.

    Science.gov (United States)

    Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh

    2017-01-01

    Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). In this article, we discuss logistic regression analysis and the limitations of this technique.

  6. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    Energy Technology Data Exchange (ETDEWEB)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam [Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia amirul@unisel.edu.my, zalila@cs.usm.my, norlida@usm.my, adam@usm.my (Malaysia)

    2015-10-22

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.

  7. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    Science.gov (United States)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam

    2015-10-01

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.

  8. Predictors of work injury in underground mines——an application of a logistic regression model

    Institute of Scientific and Technical Information of China (English)

    E S. Pau

    2009-01-01

    Mine accidents and injuries are complex and generally characterized by several factors starting from personal to technical, and technical to social characteristics. In this study, an attempt has been made to identify the various factors responsible for work related injuries in mines and to estimate the risk of work injury to mine workers. The prediction of work injury in mines was done by a step-by-step multivariate logistic regression modeling with an application to case study mines in India. In total, 18 variables were considered in this study. Most of the variables are not directly quantifiable. Instruments were developed to quantify them through a questionnaire type survey. Underground mine workers were randomly selected for the survey. Responses from 300 participants were used for the analysis. Four variables, age, negative affectivity, job dissatisfaction, and physical hazards, bear significant discriminating power for risk of injury to the workers, comparing between cases and controls in a multivariate situation while controlling all the personal and socio-technical variables. The analysis reveals that negatively affected workers are 2.54 times more prone to injuries than the less negatively affected workers and this factor is a more impOrtant risk factor for the case-study mines. Long term planning through identification of the negative individuals, proper counseling regarding the adverse effects of negative behaviors and special training is urgently required. Care should be taken for the aged and experienced workers in terms of their job responsibility and training requirements. Management should provide a friendly atmosphere during work to increase the confidence of the injury prone miners.

  9. Estimation of Logistic Regression Models in Small Samples. A Simulation Study Using a Weakly Informative Default Prior Distribution

    Science.gov (United States)

    Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel

    2012-01-01

    In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…

  10. Diagnosis of hepatic fibrosis in hepatitis B patients by logistic regression modeling based on plasma amino acid ratio and age

    Institute of Scientific and Technical Information of China (English)

    张占卿

    2013-01-01

    Objective To explore the efficacy of logistic regression modeling based on plasma amino acid profile and patient age,for diagnosing hepatic fibrosis in patients with chronic hepatitis B (CHB) .Methods One-hundredand-forty-eight patients (108 males;mean age:38.1±11.9 years,range:16—72 years) histologically

  11. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Science.gov (United States)

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  12. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Science.gov (United States)

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  13. A comparative study of slope failure prediction using logistic regression, support vector machine and least square support vector machine models

    Science.gov (United States)

    Zhou, Lim Yi; Shan, Fam Pei; Shimizu, Kunio; Imoto, Tomoaki; Lateh, Habibah; Peng, Koay Swee

    2017-08-01

    A comparative study of logistic regression, support vector machine (SVM) and least square support vector machine (LSSVM) models has been done to predict the slope failure (landslide) along East-West Highway (Gerik-Jeli). The effects of two monsoon seasons (southwest and northeast) that occur in Malaysia are considered in this study. Two related factors of occurrence of slope failure are included in this study: rainfall and underground water. For each method, two predictive models are constructed, namely SOUTHWEST and NORTHEAST models. Based on the results obtained from logistic regression models, two factors (rainfall and underground water level) contribute to the occurrence of slope failure. The accuracies of the three statistical models for two monsoon seasons are verified by using Relative Operating Characteristics curves. The validation results showed that all models produced prediction of high accuracy. For the results of SVM and LSSVM, the models using RBF kernel showed better prediction compared to the models using linear kernel. The comparative results showed that, for SOUTHWEST models, three statistical models have relatively similar performance. For NORTHEAST models, logistic regression has the best predictive efficiency whereas the SVM model has the second best predictive efficiency.

  14. Leukemia prediction using sparse logistic regression.

    Directory of Open Access Journals (Sweden)

    Tapio Manninen

    Full Text Available We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an [Formula: see text] regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.

  15. Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression

    Science.gov (United States)

    Dineva, S.; Prodanova, K.; Mlachkova, D.

    2013-12-01

    The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.

  16. Using Logistic Regression to Model New York City Restaurant Grades Over a Two-Year Period

    Directory of Open Access Journals (Sweden)

    David Nadler

    2014-07-01

    Full Text Available A knowledge gap exists in the role of restaurant type on the prediction of attaining the highest grade possible from the local health inspection agency. This study identified disparities using logistic regression between the issuance of a Grade A and restaurant type and location. This study tested the eight most inspected types of restaurants within the City of New York and calculated the odds ratios of their receiving the highest inspection grade by the New York City Department of Health and Mental Hygiene. A fitted equation has been proposed for the prediction of receiving the highest inspection grade based upon the citywide results of these eight restaurant types from calendar years 2011 and 2012. The results suggest that certain styles of restaurants have lower odds of receiving the highest grade in comparison to American-style restaurants.

  17. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  18. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  19. An alternative to evaluate the efficiency of in vitro culture medium using a logistic regression model

    Directory of Open Access Journals (Sweden)

    Daniel Furtado Ferreira

    2003-01-01

    Full Text Available The evaluation of a culture medium for the in vitro culture of a species is performed using its physical and/or chemical properties. However, the analysis of the experimental results makes it possible to evaluate its quality. In this sense, this work presents an alternative using a logistic model to evaluate the culture medium to be used in vitro. The probabilities provided by this model will be used as a medium evaluator index. The importance of this index is based on the formalization of a statistical criterion for the selection of the adequate culture medium to be used on in vitro culture without excluding its physical and/or chemical properties. To demonstrate this procedure, an experiment determining the ideal medium for the in vitro culture of primary explants of Ipeca [Psychotria ipecacuanha (Brot. Stokes] was evaluated. The differentiation of the culture medium was based on the presence and absence of the growth regulator BAP (6-benzilaminopurine. A logistic model was adjusted as a function of the weight of fresh and dry matter. Minimum, medium and maximum probabilities obtained with this model showed that the culture medium containing BAP was the most adequate for the explant growth. Due to the high discriminative power of these mediums, detected by the model, their use is recommended as an alternative to select culture medium for similar experiments.

  20. The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea

    Science.gov (United States)

    Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee

    2016-02-01

    The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.

  1. The identification of menstrual blood in forensic samples by logistic regression modeling of miRNA expression.

    Science.gov (United States)

    Hanson, Erin K; Mirza, Mohid; Rekab, Kamel; Ballantyne, Jack

    2014-11-01

    We report the identification of sensitive and specific miRNA biomarkers for menstrual blood, a tissue that might provide probative information in certain specialized instances. We incorporated these biomarkers into qPCR assays and developed a quantitative statistical model using logistic regression that permits the prediction of menstrual blood in a forensic sample with a high, and measurable, degree of accuracy. Using the developed model, we achieved 100% accuracy in determining the body fluid of interest for a set of test samples (i.e. samples not used in model development). The development, and details, of the logistic regression model are described. Testing and evaluation of the finalized logistic regression modeled assay using a small number of samples was carried out to preliminarily estimate the limit of detection (LOD), specificity in admixed samples and expression of the menstrual blood miRNA biomarkers throughout the menstrual cycle (25-28 days). The LOD was blood was identified only during the menses phase of the female reproductive cycle in two donors.

  2. Logistic Regression for Evolving Data Streams Classification

    Institute of Scientific and Technical Information of China (English)

    YIN Zhi-wu; HUANG Shang-teng; XUE Gui-rong

    2007-01-01

    Logistic regression is a fast classifier and can achieve higher accuracy on small training data. Moreover,it can work on both discrete and continuous attributes with nonlinear patterns. Based on these properties of logistic regression, this paper proposed an algorithm, called evolutionary logistical regression classifier (ELRClass), to solve the classification of evolving data streams. This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier, to keep this classifier if its performance is deteriorated by the reason of bursting noise, or to construct a new classifier if a major concept drift is detected. The intensive experimental results demonstrate the effectiveness of this algorithm.

  3. An empirical study of statistical properties of variance partition coefficients for multi-level logistic regression models

    Science.gov (United States)

    Li, J.; Gray, B.R.; Bates, D.M.

    2008-01-01

    Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.

  4. Logistic Regression Applied to Seismic Discrimination

    Energy Technology Data Exchange (ETDEWEB)

    BG Amindan; DN Hagedorn

    1998-10-08

    The usefulness of logistic discrimination was examined in an effort to learn how it performs in a regional seismic setting. Logistic discrimination provides an easily understood method, works with user-defined models and few assumptions about the population distributions, and handles both continuous and discrete data. Seismic event measurements from a data set compiled by Los Alamos National Laboratory (LANL) of Chinese events recorded at station WMQ were used in this demonstration study. PNNL applied logistic regression techniques to the data. All possible combinations of the Lg and Pg measurements were tried, and a best-fit logistic model was created. The best combination of Lg and Pg frequencies for predicting the source of a seismic event (earthquake or explosion) used Lg{sub 3.0-6.0} and Pg{sub 3.0-6.0} as the predictor variables. A cross-validation test was run, which showed that this model was able to correctly predict 99.7% earthquakes and 98.0% explosions for this given data set. Two other models were identified that used Pg and Lg measurements from the 1.5 to 3.0 Hz frequency range. Although these other models did a good job of correctly predicting the earthquakes, they were not as effective at predicting the explosions. Two possible biases were discovered which affect the predicted probabilities for each outcome. The first bias was due to this being a case-controlled study. The sampling fractions caused a bias in the probabilities that were calculated using the models. The second bias is caused by a change in the proportions for each event. If at a later date the proportions (a priori probabilities) of explosions versus earthquakes change, this would cause a bias in the predicted probability for an event. When using logistic regression, the user needs to be aware of the possible biases and what affect they will have on the predicted probabilities.

  5. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

    Science.gov (United States)

    Schaeben, Helmut; Semmler, Georg

    2016-09-01

    The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.

  6. Examining asymmetric effects in the South African Philips curve: Evidence from logistic smooth transition regression (LSTR) models

    OpenAIRE

    Phiri, Andrew

    2015-01-01

    This study contributes to the foregoing literature by investigating asymmetric behaviour within the South African short-run Phillips curve for three versions of the Phillips curve specification namely; the New Classical Phillips curve, the New Keynesian Phillips curve and the Hybrid New Keynesian Phillips curve. To this end, we employ a logistic smooth transition regression (LSTR) econometric model to each of the aforementioned versions of the Phillips curve specifications for quarterly data ...

  7. Investigating nonlinear speculation in cattle, corn, and hog futures markets using logistic smooth transition regression models

    OpenAIRE

    Röthig, Andreas; Chiarella, Carl

    2006-01-01

    This article explores nonlinearities in the response of speculators' trading activity to price changes in live cattle, corn, and lean hog futures markets. Analyzing weekly data from March 4, 1997 to December 27, 2005, we reject linearity in all of these markets. Using smooth transition regression models, we find a similar structure of nonlinearities with regard to the number of different regimes, the choice of the transition variable, and the value at which the transition occurs.

  8. Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model.

    Science.gov (United States)

    Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan

    2016-10-01

    Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly.

  9. Unitary Response Regression Models

    Science.gov (United States)

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  10. Variable Selection in Logistic Regression Mo del

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen

    2015-01-01

    Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.

  11. Good Corporate Governance and Predicting Financial Distress Using Logistic and Probit Regression Model

    Directory of Open Access Journals (Sweden)

    Juniarti Juniarti

    2013-01-01

    Full Text Available The study aims to prove whether good corporate governance (GCG is able to predict the probability of companies experiencing financial difficulties. Financial ratios that traditionally used for predicting bankruptcy remains used in this study. Besides, this study also compares logit and probit regression models, which are widely used in research related accounting bankruptcy prediction. Both models will be compared to determine which model is more superior. The sample in this study is the infrastructure, transportation, utilities & trade, services and hotels companies experiencing financial distress in the period 2008-2011. The results show that GCG and other three variables control i.e DTA, CR and company category do not prove significantly to predict the probability of companies experiencing financial difficulties. NPM, the only variable that proved significantly distinguishing healthy firms and distress. In general, logit and probit models do not result in different conclusions. Both of the models confirm the goodness of fit of models and the results of hypothesis testing. In terms of classification accuracy, logit model proves more accurate predictions than the probit models.

  12. Adjusting for unmeasured confounding due to either of two crossed factors with a logistic regression model.

    Science.gov (United States)

    Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar

    2016-08-15

    Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd.

  13. A Logistic Regression Model with a Hierarchical Random Error Term for Analyzing the Utilization of Public Transport

    Directory of Open Access Journals (Sweden)

    Chong Wei

    2015-01-01

    Full Text Available Logistic regression models have been widely used in previous studies to analyze public transport utilization. These studies have shown travel time to be an indispensable variable for such analysis and usually consider it to be a deterministic variable. This formulation does not allow us to capture travelers’ perception error regarding travel time, and recent studies have indicated that this error can have a significant effect on modal choice behavior. In this study, we propose a logistic regression model with a hierarchical random error term. The proposed model adds a new random error term for the travel time variable. This term structure enables us to investigate travelers’ perception error regarding travel time from a given choice behavior dataset. We also propose an extended model that allows constraining the sign of this error in the model. We develop two Gibbs samplers to estimate the basic hierarchical model and the extended model. The performance of the proposed models is examined using a well-known dataset.

  14. Transmission Risks of Schistosomiasis Japonica: Extraction from Back-propagation Artificial Neural Network and Logistic Regression Model

    Science.gov (United States)

    Xu, Jun-Fang; Xu, Jing; Li, Shi-Zhu; Jia, Tia-Wu; Huang, Xi-Bao; Zhang, Hua-Ming; Chen, Mei; Yang, Guo-Jing; Gao, Shu-Jing; Wang, Qing-Yun; Zhou, Xiao-Nong

    2013-01-01

    Background The transmission of schistosomiasis japonica in a local setting is still poorly understood in the lake regions of the People's Republic of China (P. R. China), and its transmission patterns are closely related to human, social and economic factors. Methodology/Principal Findings We aimed to apply the integrated approach of artificial neural network (ANN) and logistic regression model in assessment of transmission risks of Schistosoma japonicum with epidemiological data collected from 2339 villagers from 1247 households in six villages of Jiangling County, P.R. China. By using the back-propagation (BP) of the ANN model, 16 factors out of 27 factors were screened, and the top five factors ranked by the absolute value of mean impact value (MIV) were mainly related to human behavior, i.e. integration of water contact history and infection history, family with past infection, history of water contact, infection history, and infection times. The top five factors screened by the logistic regression model were mainly related to the social economics, i.e. village level, economic conditions of family, age group, education level, and infection times. The risk of human infection with S. japonicum is higher in the population who are at age 15 or younger, or with lower education, or with the higher infection rate of the village, or with poor family, and in the population with more than one time to be infected. Conclusion/Significance Both BP artificial neural network and logistic regression model established in a small scale suggested that individual behavior and socioeconomic status are the most important risk factors in the transmission of schistosomiasis japonica. It was reviewed that the young population (≤15) in higher-risk areas was the main target to be intervened for the disease transmission control. PMID:23556015

  15. Estimating the susceptibility of surface water in Texas to nonpoint-source contamination by use of logistic regression modeling

    Science.gov (United States)

    Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby

    2003-01-01

    In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.

  16. Modeling Typhoon Event-Induced Landslides Using GIS-Based Logistic Regression: A Case Study of Alishan Forestry Railway, Taiwan

    Directory of Open Access Journals (Sweden)

    Sheng-Chuan Chen

    2013-01-01

    Full Text Available This study develops a model for evaluating the hazard level of landslides at Alishan Forestry Railway, Taiwan, by using logistic regression with the assistance of a geographical information system (GIS. A typhoon event-induced landslide inventory, independent variables, and a triggering factor were used to build the model. The environmental factors such as bedrock lithology from the geology database; topographic aspect, terrain roughness, profile curvature, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images were used as variables that influence landslide occurrence. The area under curve (AUC of a receiver operator characteristic (ROC curve was used to validate the model. Effects of parameters on landslide occurrence were assessed from the corresponding coefficient that appears in the logistic regression function. Thereafter, the model was applied to predict the probability of landslides for rainfall data of different return periods. Using a predicted map of probability, the study area was classified into four ranks of landslide susceptibility: low, medium, high, and very high. As a result, most high susceptibility areas are located on the western portion of the study area. Several train stations and railways are located on sites with a high susceptibility ranking.

  17. Supporting Regularized Logistic Regression Privately and Efficiently.

    Directory of Open Access Journals (Sweden)

    Wenfa Li

    Full Text Available As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  18. Supporting Regularized Logistic Regression Privately and Efficiently.

    Science.gov (United States)

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  19. Supporting Regularized Logistic Regression Privately and Efficiently

    Science.gov (United States)

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738

  20. Food security and vulnerability modeling of East Java Province based on Geographically Weighted Ordinal Logistic Regression Semiparametric (GWOLRS model

    Directory of Open Access Journals (Sweden)

    N.W. Surya Wardhani

    2014-10-01

    Full Text Available Modeling of food security based on the characteristics of the area will be affected by the geographical location which means that geographical location will affect the region’s potential. Therefore, we need a method of statistical modeling that takes into account the geographical location or the location factor observations. In this case, the research variables could be global means that the location affects the response variables significantly; when some of the predictor variables are global and the other variables are local, then Geographically Weighted Ordinal Logistic Regression Semiparametric (GWOLRS could be used to analyze the data. The data used is the resilience and food insecurity data in 2011 in East Java Province. The result showed that three predictor variables that influenced by the location are the percentage of poor (%, rice production per district (tons and life expectancy (%. Those three predictor variables are local because they have significant influence in some districts/cities but had no significant effect in other districts/cities, while other two variables that are clean water and good quality road length (km are assumed global because it is not a significant factor for the whole districts/towns in East Java .

  1. A Methodology for Generating Placement Rules that Utilizes Logistic Regression

    Science.gov (United States)

    Wurtz, Keith

    2008-01-01

    The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…

  2. Fitting Proportional Odds Models to Educational Data in Ordinal Logistic Regression Using Stata, SAS and SPSS

    Science.gov (United States)

    Liu, Xing

    2008-01-01

    The proportional odds (PO) model, which is also called cumulative odds model (Agresti, 1996, 2002 ; Armstrong & Sloan, 1989; Long, 1997, Long & Freese, 2006; McCullagh, 1980; McCullagh & Nelder, 1989; Powers & Xie, 2000; O'Connell, 2006), is one of the most commonly used models for the analysis of ordinal categorical data and comes from the class…

  3. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    Science.gov (United States)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross

  4. Prediction of Foreign Object Debris/Damage (FOD) type for elimination in the aeronautics manufacturing environment through logistic regression model

    Science.gov (United States)

    Espino, Natalia V.

    Foreign Object Debris/Damage (FOD) is a costly and high-risk problem that aeronautics industries such as Boeing, Lockheed Martin, among others are facing at their production lines every day. They spend an average of $350 thousand dollars per year fixing FOD problems. FOD can put pilots, passengers and other crews' lives into high-risk. FOD refers to any type of foreign object, particle, debris or agent in the manufacturing environment, which could contaminate/damage the product or otherwise undermine quality control standards. FOD can be in the form of any of the following categories: panstock, manufacturing debris, tools/shop aids, consumables and trash. Although aeronautics industries have put many prevention plans in place such as housekeeping and "clean as you go" philosophies, trainings, use of RFID for tooling control, etc. none of them has been able to completely eradicate the problem. This research presents a logistic regression statistical model approach to predict probability of FOD type under given specific circumstances such as workstation, month and aircraft/jet being built. FOD Quality Assurance Reports of the last three years were provided by an aeronautical industry for this study. By predicting type of FOD, custom reduction/elimination plans can be put in place and by such means being able to diminish the problem. Different aircrafts were analyzed and so different models developed through same methodology. Results of the study presented are predictions of FOD type for each aircraft and workstation throughout the year, which were obtained by applying proposed logistic regression models. This research would help aeronautic industries to address the FOD problem correctly, to be able to identify root causes and establish actual reduction/elimination plans.

  5. Geothermal Favorability Map Derived From Logistic Regression Models of the Western United States (favorabilitysurface.zip)

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This is a surface showing relative favorability for the presence of geothermal systems in the western United States. It is an average of 12 models that correlates...

  6. Modeling Small Unmanned Aerial System Mishaps Using Logistic Regression and Artificial Neural Networks

    Science.gov (United States)

    2012-03-22

    95% of the country with low reliability requirements ( Weibel and Hansman 2005). While the primary calculations of these models are in terms of...quantified. For example, the model for ground impact by Weibel and Hansman (2005) was used to calculate a necessary mean time between failures (MTBF) to...Success in Independent Trials." Statistica Sinica 3 (1993): 295-312. Weibel , Ronald, and John Hansman. Safety Considerations for Operation of Unmanned

  7. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    Science.gov (United States)

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  8. Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models

    Directory of Open Access Journals (Sweden)

    Sabz Ali

    2016-01-01

    Full Text Available For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML method is better than Penalized Quasilikelihood (PQL method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.

  9. Sufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models

    Science.gov (United States)

    Ali, Amjad; Khan, Sajjad Ahmad; Hussain, Sundas

    2016-01-01

    For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation conditions, Maximum Likelihood (ML) method is better than Penalized Quasilikelihood (PQL) method in three-category ordinal outcome variable. PQL method, however, performs equally well as ML method where five-category ordinal outcome variable is used. Further, to achieve power more than 0.80, at least 50 groups are required for both ML and PQL methods of estimation. It may be pointed out that, for five-category ordinal response variable model, the power of PQL method is slightly higher than the power of ML method.

  10. Binary Logistic Regression Modeling of Idle CO Emissions in Order to Estimate Predictors Influences in Old Vehicle Park

    Directory of Open Access Journals (Sweden)

    Branimir Milosavljević

    2015-01-01

    Full Text Available This paper determines, by experiments, the CO emissions at idle running with 1,785 vehicles powered by spark ignition engine, in order to verify the correctness of emissions values with a representative sample of vehicles in Serbia. The permissible emissions limits were considered for three (3 fitted binary logistic regression (BLR models, and the key reason for such analysis is finding the predictors that can have a crucial influence on the accuracy of the estimation whether such vehicles have correct emissions or not. Having summarized the research results, we found out that vehicles produced in Serbia (hereinafter referred to as “domestic vehicles” cause more pollution than imported cars (hereinafter referred to as “foreign vehicles”, although domestic vehicles are of lower average age and mileage. Another trend was observed: low-power vehicles and vehicles produced before 1992 are potentially more serious polluters.

  11. Adverse events associated with incretin-based drugs in Japanese spontaneous reports: a mixed effects logistic regression model

    Directory of Open Access Journals (Sweden)

    Daichi Narushima

    2016-03-01

    Full Text Available Background: Spontaneous Reporting Systems (SRSs are passive systems composed of reports of suspected Adverse Drug Events (ADEs, and are used for Pharmacovigilance (PhV, namely, drug safety surveillance. Exploration of analytical methodologies to enhance SRS-based discovery will contribute to more effective PhV. In this study, we proposed a statistical modeling approach for SRS data to address heterogeneity by a reporting time point. Furthermore, we applied this approach to analyze ADEs of incretin-based drugs such as DPP-4 inhibitors and GLP-1 receptor agonists, which are widely used to treat type 2 diabetes. Methods: SRS data were obtained from the Japanese Adverse Drug Event Report (JADER database. Reported adverse events were classified according to the MedDRA High Level Terms (HLTs. A mixed effects logistic regression model was used to analyze the occurrence of each HLT. The model treated DPP-4 inhibitors, GLP-1 receptor agonists, hypoglycemic drugs, concomitant suspected drugs, age, and sex as fixed effects, while the quarterly period of reporting was treated as a random effect. Before application of the model, Fisher’s exact tests were performed for all drug-HLT combinations. Mixed effects logistic regressions were performed for the HLTs that were found to be associated with incretin-based drugs. Statistical significance was determined by a two-sided p-value <0.01 or a 99% two-sided confidence interval. Finally, the models with and without the random effect were compared based on Akaike’s Information Criteria (AIC, in which a model with a smaller AIC was considered satisfactory. Results: The analysis included 187,181 cases reported from January 2010 to March 2015. It showed that 33 HLTs, including pancreatic, gastrointestinal, and cholecystic events, were significantly associated with DPP-4 inhibitors or GLP-1 receptor agonists. In the AIC comparison, half of the HLTs reported with incretin-based drugs favored the random effect

  12. A Multi-industry Default Prediction Model using Logistic Regression and Decision Tree

    Directory of Open Access Journals (Sweden)

    Suresh Ramakrishnan

    2015-04-01

    Full Text Available The accurate prediction of corporate bankruptcy for the firms in different industries is of a great concern to investors and creditors, as the reduction of creditors’ risk and a considerable amount of saving for an industry economy can be possible. Financial statements vary between industries. Therefore, economic intuition suggests that industry effects should be an important component in bankruptcy prediction. This study attempts to detail the characteristics of each industry using sector indicators. The results show significant relationship between probability of default and sector indicators. The results of this study may improve the default prediction models performance and reduce the costs of risk management.

  13. Integration of geographic information systems and logistic multiple regression for aquatic macrophyte modeling

    Energy Technology Data Exchange (ETDEWEB)

    Narumalani, S. [Nebraska Univ., Lincoln, NE (United States). Dept. of Geography; Jensen, J.R.; Althausen, J.D.; Burkhalter, S. [South Carolina Univ., Columbia, SC (United States). Dept. of Geography; Mackey, H.E. Jr. [Westinghouse Savannah River Co., Aiken, SC (United States)

    1994-06-01

    Since aquatic macrophytes have an important influence on the physical and chemical processes of an ecosystem while simultaneously affecting human activity, it is imperative that they be inventoried and managed wisely. However, mapping wetlands can be a major challenge because they are found in diverse geographic areas ranging from small tributary streams, to shrub or scrub and marsh communities, to open water lacustrian environments. In addition, the type and spatial distribution of wetlands can change dramatically from season to season, especially when nonpersistent species are present. This research, focuses on developing a model for predicting the future growth and distribution of aquatic macrophytes. This model will use a geographic information system (GIS) to analyze some of the biophysical variables that affect aquatic macrophyte growth and distribution. The data will provide scientists information on the future spatial growth and distribution of aquatic macrophytes. This study focuses on the Savannah River Site Par Pond (1,000 ha) and L Lake (400 ha) these are two cooling ponds that have received thermal effluent from nuclear reactor operations. Par Pond was constructed in 1958, and natural invasion of wetland has occurred over its 35-year history, with much of the shoreline having developed extensive beds of persistent and non-persistent aquatic macrophytes.

  14. Predicting Social Trust with Binary Logistic Regression

    Science.gov (United States)

    Adwere-Boamah, Joseph; Hufstedler, Shirley

    2015-01-01

    This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…

  15. Predicting China’s SME Credit Risk in Supply Chain Financing by Logistic Regression, Artificial Neural Network and Hybrid Models

    Directory of Open Access Journals (Sweden)

    You Zhu

    2016-05-01

    Full Text Available Based on logistic regression (LR and artificial neural network (ANN methods, we construct an LR model, an ANN model and three types of a two-stage hybrid model. The two-stage hybrid model is integrated by the LR and ANN approaches. We predict the credit risk of China’s small and medium-sized enterprises (SMEs for financial institutions (FIs in the supply chain financing (SCF by applying the above models. In the empirical analysis, the quarterly financial and non-financial data of 77 listed SMEs and 11 listed core enterprises (CEs in the period of 2012–2013 are chosen as the samples. The empirical results show that: (i the “negative signal” prediction accuracy ratio of the ANN model is better than that of LR model; (ii the two-stage hybrid model type I has a better performance of predicting “positive signals” than that of the ANN model; (iii the two-stage hybrid model type II has a stronger ability both in aspects of predicting “positive signals” and “negative signals” than that of the two-stage hybrid model type I; and (iv “negative signal” predictive power of the two-stage hybrid model type III is stronger than that of the two-stage hybrid model type II. In summary, the two-stage hybrid model III has the best classification capability to forecast SMEs credit risk in SCF, which can be a useful prediction tool for China’s FIs.

  16. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery.

    Directory of Open Access Journals (Sweden)

    Hon-Yi Shi

    Full Text Available BACKGROUND: Since most published articles comparing the performance of artificial neural network (ANN models and logistic regression (LR models for predicting hepatocellular carcinoma (HCC outcomes used only a single dataset, the essential issue of internal validity (reproducibility of the models has not been addressed. The study purposes to validate the use of ANN model for predicting in-hospital mortality in HCC surgery patients in Taiwan and to compare the predictive accuracy of ANN with that of LR model. METHODOLOGY/PRINCIPAL FINDINGS: Patients who underwent a HCC surgery during the period from 1998 to 2009 were included in the study. This study retrospectively compared 1,000 pairs of LR and ANN models based on initial clinical data for 22,926 HCC surgery patients. For each pair of ANN and LR models, the area under the receiver operating characteristic (AUROC curves, Hosmer-Lemeshow (H-L statistics and accuracy rate were calculated and compared using paired T-tests. A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and the relative importance of variables. Compared to the LR models, the ANN models had a better accuracy rate in 97.28% of cases, a better H-L statistic in 41.18% of cases, and a better AUROC curve in 84.67% of cases. Surgeon volume was the most influential (sensitive parameter affecting in-hospital mortality followed by age and lengths of stay. CONCLUSIONS/SIGNIFICANCE: In comparison with the conventional LR model, the ANN model in the study was more accurate in predicting in-hospital mortality and had higher overall performance indices. Further studies of this model may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data.

  17. To Set Up a Logistic Regression Prediction Model for Hepatotoxicity of Chinese Herbal Medicines Based on Traditional Chinese Medicine Theory

    Science.gov (United States)

    Liu, Hongjie; Li, Tianhao; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua

    2016-01-01

    Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p flavors (p 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs. PMID:27656240

  18. To Set Up a Logistic Regression Prediction Model for Hepatotoxicity of Chinese Herbal Medicines Based on Traditional Chinese Medicine Theory.

    Science.gov (United States)

    Liu, Hongjie; Li, Tianhao; Chen, Lingxiu; Zhan, Sha; Pan, Meilan; Ma, Zhiguo; Li, Chenghua; Zhang, Zhe

    2016-01-01

    Aims. To establish a logistic regression (LR) prediction model for hepatotoxicity of Chinese herbal medicines (HMs) based on traditional Chinese medicine (TCM) theory and to provide a statistical basis for predicting hepatotoxicity of HMs. Methods. The correlations of hepatotoxic and nonhepatotoxic Chinese HMs with four properties, five flavors, and channel tropism were analyzed with chi-square test for two-way unordered categorical data. LR prediction model was established and the accuracy of the prediction by this model was evaluated. Results. The hepatotoxic and nonhepatotoxic Chinese HMs were related with four properties (p 0.05). There were totally 12 variables from four properties and five flavors for the LR. Four variables, warm and neutral of the four properties and pungent and salty of five flavors, were selected to establish the LR prediction model, with the cutoff value being 0.204. Conclusions. Warm and neutral of the four properties and pungent and salty of five flavors were the variables to affect the hepatotoxicity. Based on such results, the established LR prediction model had some predictive power for hepatotoxicity of Chinese HMs.

  19. A Novel Method for Earthquake-triggered Landslides Susceptibility Mapping: Combining the Newmark Displacement Value with Logistic Regression Model

    Science.gov (United States)

    Lin, Q.; Wang, Y.; Song, C.

    2016-12-01

    The Newmark displacement model has been used to predict earthquake-triggered landslides. Logistic regression (LR) is also a common landslide hazard assessment method. We combined the Newmark displacement model and LR and applied them to Wenchuan County and Beichuan County in China, which were affected by the Ms.8.0 Wenchuan earthquake on May 12th, 2008, to develop a mechanism-based landslide occurrence probability model and improve the predictive accuracy. A total of 1904 landslide sites in Wenchuan County and 3800 random non-landslide sites were selected as the training dataset. We applied the Newmark model and obtained the distribution of permanent displacement (Dn) for a 30 × 30 m grid. Four factors (Dn, topographic relief, and distances to drainages and roads) were used as independent variables for LR. Then, a combined model was obtained, with an AUC (area under the curve) value of 0.797 for Wenchuan County. A total of 617 landslide sites and non-landslide sites in Beichuan County were used as a validation dataset with AUC = 0.753. The proposed method may also be applied to earthquake-induced landslides in other regions.

  20. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    Directory of Open Access Journals (Sweden)

    Hong Wang

    Full Text Available Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  1. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    Science.gov (United States)

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  2. Logistic regression a self-learning text

    CERN Document Server

    Kleinbaum, David G

    1994-01-01

    This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.

  3. Identification and validation of a logistic regression model for predicting serious injuries associated with motor vehicle crashes.

    Science.gov (United States)

    Kononen, Douglas W; Flannagan, Carol A C; Wang, Stewart C

    2011-01-01

    A multivariate logistic regression model, based upon National Automotive Sampling System Crashworthiness Data System (NASS-CDS) data for calendar years 1999-2008, was developed to predict the probability that a crash-involved vehicle will contain one or more occupants with serious or incapacitating injuries. These vehicles were defined as containing at least one occupant coded with an Injury Severity Score (ISS) of greater than or equal to 15, in planar, non-rollover crash events involving Model Year 2000 and newer cars, light trucks, and vans. The target injury outcome measure was developed by the Centers for Disease Control and Prevention (CDC)-led National Expert Panel on Field Triage in their recent revision of the Field Triage Decision Scheme (American College of Surgeons, 2006). The parameters to be used for crash injury prediction were subsequently specified by the National Expert Panel. Model input parameters included: crash direction (front, left, right, and rear), change in velocity (delta-V), multiple vs. single impacts, belt use, presence of at least one older occupant (≥ 55 years old), presence of at least one female in the vehicle, and vehicle type (car, pickup truck, van, and sport utility). The model was developed using predictor variables that may be readily available, post-crash, from OnStar-like telematics systems. Model sensitivity and specificity were 40% and 98%, respectively, using a probability cutpoint of 0.20. The area under the receiver operator characteristic (ROC) curve for the final model was 0.84. Delta-V (mph), seat belt use and crash direction were the most important predictors of serious injury. Due to the complexity of factors associated with rollover-related injuries, a separate screening algorithm is needed to model injuries associated with this crash mode. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Spatial Analysis of Severe Fever with Thrombocytopenia Syndrome Virus in China Using a Geographically Weighted Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Liang Wu

    2016-11-01

    Full Text Available Severe fever with thrombocytopenia syndrome (SFTS is caused by severe fever with thrombocytopenia syndrome virus (SFTSV, which has had a serious impact on public health in parts of Asia. There is no specific antiviral drug or vaccine for SFTSV and, therefore, it is important to determine the factors that influence the occurrence of SFTSV infections. This study aimed to explore the spatial associations between SFTSV infections and several potential determinants, and to predict the high-risk areas in mainland China. The analysis was carried out at the level of provinces in mainland China. The potential explanatory variables that were investigated consisted of meteorological factors (average temperature, average monthly precipitation and average relative humidity, the average proportion of rural population and the average proportion of primary industries over three years (2010–2012. We constructed a geographically weighted logistic regression (GWLR model in order to explore the associations between the selected variables and confirmed cases of SFTSV. The study showed that: (1 meteorological factors have a strong influence on the SFTSV cover; (2 a GWLR model is suitable for exploring SFTSV cover in mainland China; (3 our findings can be used for predicting high-risk areas and highlighting when meteorological factors pose a risk in order to aid in the implementation of public health strategies.

  5. Logistic regression when binary predictor variables are highly correlated.

    Science.gov (United States)

    Barker, L; Brown, C

    Standard logistic regression can produce estimates having large mean square error when predictor variables are multicollinear. Ridge regression and principal components regression can reduce the impact of multicollinearity in ordinary least squares regression. Generalizations of these, applicable in the logistic regression framework, are alternatives to standard logistic regression. It is shown that estimates obtained via ridge and principal components logistic regression can have smaller mean square error than estimates obtained through standard logistic regression. Recommendations for choosing among standard, ridge and principal components logistic regression are developed. Published in 2001 by John Wiley & Sons, Ltd.

  6. SU-F-BRD-01: A Logistic Regression Model to Predict Objective Function Weights in Prostate Cancer IMRT

    Energy Technology Data Exchange (ETDEWEB)

    Boutilier, J; Chan, T; Lee, T [University of Toronto, Toronto, Ontario (Canada); Craig, T; Sharpe, M [University of Toronto, Toronto, Ontario (Canada); The Princess Margaret Cancer Centre - UHN, Toronto, ON (Canada)

    2014-06-15

    Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the left femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.

  7. A nesting site suitability model for rock partridge (Alectoris graeca in the Apennine Mountains using logistic regression

    Directory of Open Access Journals (Sweden)

    Lorenzo Boccia

    2010-01-01

    Full Text Available The rock partridge has undergone a decline throughout its entire distribution area, including the population of the central Italian Apennine Mountains. Areas of suitable habitat for this species have been reduced due to landscape fragmentation and the dynamics of domestic animal and wildlife management. The present study was conducted in the Province of Rieti, Lazio Region. Geograph- ical and land use predictors were evaluated in a GIS environment to identify the most relevant factors influencing the presence of rock partridge during the nesting period. Logistic regression was then imple- mented to create a model, characterised by a good level of adequacy, for predicting rock partridge nesting site habitat characteristics. Correct predictions of presence and absence were made in 65.2% and 98.6% of cases, respectively. The ROC value was 0.771, which is statistically significant (P<0.001. The results show that, on a local scale, slope (log, distance from forests, and the presence of bare rocks were statisti- cally significant factors. On a landscape scale, the percentage of forests, the presence of sparse vegetation (over 60%, and a negative Mean Shape Index (MSI were found to be statistically significant.

  8. Financial performance monitoring of the technical efficiency of critical access hospitals: a data envelopment analysis and logistic regression modeling approach.

    Science.gov (United States)

    Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V

    2012-01-01

    From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).

  9. Study of risk factors affecting both hypertension and obesity outcome by using multivariate multilevel logistic regression models

    Directory of Open Access Journals (Sweden)

    Sepedeh Gholizadeh

    2016-07-01

    Full Text Available Background:Obesity and hypertension are the most important non-communicable diseases thatin many studies, the prevalence and their risk factors have been performedin each geographic region univariately.Study of factors affecting both obesity and hypertension may have an important role which to be adrressed in this study. Materials &Methods:This cross-sectional study was conducted on 1000 men aged 20-70 living in Bushehr province. Blood pressure was measured three times and the average of them was considered as one of the response variables. Hypertension was defined as systolic blood pressure ≥140 (and-or diastolic blood pressure ≥90 and obesity was defined as body mass index ≥25. Data was analyzed by using multilevel, multivariate logistic regression model by MlwiNsoftware. Results:Intra class correlations in cluster level obtained 33% for high blood pressure and 37% for obesity, so two level model was fitted to data. The prevalence of obesity and hypertension obtained 43.6% (0.95%CI; 40.6-46.5, 29.4% (0.95%CI; 26.6-32.1 respectively. Age, gender, smoking, hyperlipidemia, diabetes, fruit and vegetable consumption and physical activity were the factors affecting blood pressure (p≤0.05. Age, gender, hyperlipidemia, diabetes, fruit and vegetable consumption, physical activity and place of residence are effective on obesity (p≤0.05. Conclusion: The multilevel models with considering levels distribution provide more precise estimates. As regards obesity and hypertension are the major risk factors for cardiovascular disease, by knowing the high-risk groups we can d careful planning to prevention of non-communicable diseases and promotion of society health.

  10. Logistic regression and artificial neural network models for mapping of regional-scale landslide susceptibility in volcanic mountains of West Java (Indonesia)

    Science.gov (United States)

    Ngadisih, Bhandary, Netra P.; Yatabe, Ryuichi; Dahal, Ranjan K.

    2016-05-01

    West Java Province is the most landslide risky area in Indonesia owing to extreme geo-morphological conditions, climatic conditions and densely populated settlements with immense completed and ongoing development activities. So, a landslide susceptibility map at regional scale in this province is a fundamental tool for risk management and land-use planning. Logistic regression and Artificial Neural Network (ANN) models are the most frequently used tools for landslide susceptibility assessment, mainly because they are capable of handling the nature of landslide data. The main objective of this study is to apply logistic regression and ANN models and compare their performance for landslide susceptibility mapping in volcanic mountains of West Java Province. In addition, the model application is proposed to identify the most contributing factors to landslide events in the study area. The spatial database built in GIS platform consists of landslide inventory, four topographical parameters (slope, aspect, relief, distance to river), three geological parameters (distance to volcano crater, distance to thrust and fault, geological formation), and two anthropogenic parameters (distance to road, land use). The logistic regression model in this study revealed that slope, geological formations, distance to road and distance to volcano are the most influential factors of landslide events while, the ANN model revealed that distance to volcano crater, geological formation, distance to road, and land-use are the most important causal factors of landslides in the study area. Moreover, an evaluation of the model showed that the ANN model has a higher accuracy than the logistic regression model.

  11. Jackknife bias reduction for polychotomous logistic regression.

    Science.gov (United States)

    Bull, S B; Greenwood, C M; Hauck, W W

    1997-03-15

    Despite theoretical and empirical evidence that the usual MLEs can be misleading in finite samples and some evidence that bias reduced estimates are less biased and more efficient, they have not seen a wide application in practice. One can obtain bias reduced estimates by jackknife methods, with or without full iteration, or by use of higher order terms in a Taylor series expansion of the log-likelihood to approximate asymptotic bias. We provide details of these methods for polychotomous logistic regression with a nominal categorical response. We conducted a Monte Carlo comparison of the jackknife and Taylor series estimates in moderate sample sizes in a general logistic regression setting, to investigate dichotomous and trichotomous responses and a mixture of correlated and uncorrelated binary and normal covariates. We found an approximate two-step jackknife and the Taylor series methods useful when the ratio of the number of observations to the number of parameters is greater than 15, but we cannot recommend the two-step and the fully iterated jackknife estimates when this ratio is less than 20, especially when there are large effects, binary covariates, or multicollinearity in the covariates.

  12. Methodologies for the assessment of earthquake-triggered landslides hazard. A comparison of Logistic Regression and Artificial Neural Network models.

    Science.gov (United States)

    García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.

    2009-04-01

    In recent years, interest in landslide hazard assessment studies has increased substantially. They are appropriate for evaluation and mitigation plan development in landslide-prone areas. There are several techniques available for landslide hazard research at a regional scale. Generally, they can be classified in two groups: qualitative and quantitative methods. Most of qualitative methods tend to be subjective, since they depend on expert opinions and represent hazard levels in descriptive terms. On the other hand, quantitative methods are objective and they are commonly used due to the correlation between the instability factors and the location of the landslides. Within this group, statistical approaches and new heuristic techniques based on artificial intelligence (artificial neural network (ANN), fuzzy logic, etc.) provide rigorous analysis to assess landslide hazard over large regions. However, they depend on qualitative and quantitative data, scale, types of movements and characteristic factors used. We analysed and compared an approach for assessing earthquake-triggered landslides hazard using logistic regression (LR) and artificial neural networks (ANN) with a back-propagation learning algorithm. One application has been developed in El Salvador, a country of Central America where the earthquake-triggered landslides are usual phenomena. In a first phase, we analysed the susceptibility and hazard associated to the seismic scenario of the 2001 January 13th earthquake. We calibrated the models using data from the landslide inventory for this scenario. These analyses require input variables representing physical parameters to contribute to the initiation of slope instability, for example, slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness, while the occurrence or non-occurrence of landslides is considered as dependent variable. The results of the landslide susceptibility analysis are checked using landslide

  13. Estimating the exceedance probability of rain rate by logistic regression

    Science.gov (United States)

    Chiu, Long S.; Kedem, Benjamin

    1990-01-01

    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  14. Logistic regression applied to natural hazards: rare event logistic regression with replications

    Directory of Open Access Journals (Sweden)

    M. Guns

    2012-06-01

    Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  15. Multivariate logistic regression analysis of postoperative complications and risk model establishment of gastrectomy for gastric cancer: A single-center cohort report.

    Science.gov (United States)

    Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing

    2016-01-01

    Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.

  16. Detecting Differential Item Functioning Using Logistic Regression Procedures.

    Science.gov (United States)

    Swaminathan, Hariharan; Rogers, H. Jane

    1990-01-01

    A logistic regression model for characterizing differential item functioning (DIF) between two groups is presented. A distinction is drawn between uniform and nonuniform DIF in terms of model parameters. A statistic for testing the hypotheses of no DIF is developed, and simulation studies compare it with the Mantel-Haenszel procedure. (Author/TJH)

  17. Logistic regression against a divergent Bayesian network

    Directory of Open Access Journals (Sweden)

    Noel Antonio Sánchez Trujillo

    2015-01-01

    Full Text Available This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered; we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.

  18. The adaptive Lasso for Logistic regression models%Logistic模型中参数的自适应Lasso估计

    Institute of Scientific and Technical Information of China (English)

    王娉; 郭鹏江; 夏志明

    2012-01-01

    目的 研究Logistic模型的参数估计.方法 在L1罚中引用一个自适应的权,即自适应Lasso方法.结果 自适应Lasso方法对Logistic模型同时进行了模型选择和参数估计.结论 在一定的正则条件下,Logistic模型的自适应Lasso估计是满足Oracle性质的.%Aim To estimate the parameters in the Logistic model. Methods Adaptive weights are used in the L1 penalty, which is adaptive Lasso. Results The adaptive Lasso selects variables and estimates parameters simulta-neously for the Logistic model. Conclusion Under certain regular conditions, the adaptive Lasso enjoys the oracle properties.

  19. Bayesian Lasso and multinomial logistic regression on GPU.

    Science.gov (United States)

    Češnovar, Rok; Štrumbelj, Erik

    2017-01-01

    We describe an efficient Bayesian parallel GPU implementation of two classic statistical models-the Lasso and multinomial logistic regression. We focus on parallelizing the key components: matrix multiplication, matrix inversion, and sampling from the full conditionals. Our GPU implementations of Bayesian Lasso and multinomial logistic regression achieve 100-fold speedups on mid-level and high-end GPUs. Substantial speedups of 25 fold can also be achieved on older and lower end GPUs. Samplers are implemented in OpenCL and can be used on any type of GPU and other types of computational units, thereby being convenient and advantageous in practice compared to related work.

  20. Score normalization using logistic regression with expected parameters

    NARCIS (Netherlands)

    Aly, Robin

    2014-01-01

    State-of-the-art score normalization methods use generative models that rely on sometimes unrealistic assumptions. We propose a novel parameter estimation method for score normalization based on logistic regression. Experiments on the Gov2 and CluewebA collection indicate that our method is consiste

  1. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Institute of Scientific and Technical Information of China (English)

    Hejun KANG; Shelley M.ALEXANDER

    2009-01-01

    We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.

  2. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Directory of Open Access Journals (Sweden)

    Shelley M. ALEXANDER

    2009-02-01

    Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].

  3. Classification of microarray data with penalized logistic regression

    Science.gov (United States)

    Eilers, Paul H. C.; Boer, Judith M.; van Ommen, Gert-Jan; van Houwelingen, Hans C.

    2001-06-01

    Classification of microarray data needs a firm statistical basis. In principle, logistic regression can provide it, modeling the probability of membership of a class with (transforms of) linear combinations of explanatory variables. However, classical logistic regression does not work for microarrays, because generally there will be far more variables than observations. One problem is multicollinearity: estimating equations become singular and have no unique and stable solution. A second problem is over-fitting: a model may fit well into a data set, but perform badly when used to classify new data. We propose penalized likelihood as a solution to both problems. The values of the regression coefficients are constrained in a similar way as in ridge regression. All variables play an equal role, there is no ad-hoc selection of most relevant or most expressed genes. The dimension of the resulting systems of equations is equal to the number of variables, and generally will be too large for most computers, but it can dramatically be reduced with the singular value decomposition of some matrices. The penalty is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance. We find that penalized logistic regression performs well on a public data set (the MIT ALL/AML data).

  4. Logistic chain modelling

    NARCIS (Netherlands)

    Slats, P.A.; Bhola, B.; Evers, J.J.M.; Dijkhuizen, G.

    1995-01-01

    Logistic chain modelling is very important in improving the overall performance of the total logistic chain. Logistic models provide support for a large range of applications, such as analysing bottlenecks, improving customer service, configuring new logistic chains and adapting existing chains to n

  5. Reproductive risk factors assessment for anaemia among pregnant women in India using a multinomial logistic regression model.

    Science.gov (United States)

    Perumal, Vanamail

    2014-07-01

    To assess reproductive risk factors for anaemia among pregnant women in urban and rural areas of India. The International Institute of Population Sciences, India, carried out third National Family Health Survey in 2005-2006 to estimate a key indicator from a sample of ever-married women in the reproductive age group 15-49 years. Data on various dimensions were collected using a structured questionnaire, and anaemia was measured using a portable HemoCue instrument. Anaemia prevalence among pregnant women was compared between rural and urban areas using chi-square test and odds ratio. Multinomial logistic regression analysis was used to determine risk factors. Anaemia prevalence was assessed among 3355 pregnant women from rural areas and 1962 pregnant women from urban areas. Moderate-to-severe anaemia in rural areas (32.4%) is significantly more common than in urban areas (27.3%) with an excess risk of 30%. Gestational age specific prevalence of anaemia significantly increases in rural areas after 6 months. Pregnancy duration is a significant risk factor in both urban and rural areas. In rural areas, increasing age at marriage and mass media exposure are significant protective factors of anaemia. However, more births in the last five years, alcohol consumption and smoking habits are significant risk factors. In rural areas, various reproductive factors and lifestyle characteristics constitute significant risk factors for moderate-to-severe anaemia. Therefore, intensive health education on reproductive practices and the impact of lifestyle characteristics are warranted to reduce anaemia prevalence. © 2014 John Wiley & Sons Ltd.

  6. Determination of riverbank erosion probability using Locally Weighted Logistic Regression

    Science.gov (United States)

    Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos

    2015-04-01

    Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The

  7. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  8. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  9. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    Science.gov (United States)

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  10. Prediction of siRNA potency using sparse logistic regression.

    Science.gov (United States)

    Hu, Wei; Hu, John

    2014-06-01

    RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.

  11. [Multiple imputation and complete case analysis in logistic regression models: a practical assessment of the impact of incomplete covariate data].

    Science.gov (United States)

    Camargos, Vitor Passos; César, Cibele Comini; Caiaffa, Waleska Teixeira; Xavier, Cesar Coelho; Proietti, Fernando Augusto

    2011-12-01

    Researchers in the health field often deal with the problem of incomplete databases. Complete Case Analysis (CCA), which restricts the analysis to subjects with complete data, reduces the sample size and may result in biased estimates. Based on statistical grounds, Multiple Imputation (MI) uses all collected data and is recommended as an alternative to CCA. Data from the study Saúde em Beagá, attended by 4,048 adults from two of nine health districts in the city of Belo Horizonte, Minas Gerais State, Brazil, in 2008-2009, were used to evaluate CCA and different MI approaches in the context of logistic models with incomplete covariate data. Peculiarities in some variables in this study allowed analyzing a situation in which the missing covariate data are recovered and thus the results before and after recovery are compared. Based on the analysis, even the more simplistic MI approach performed better than CCA, since it was closer to the post-recovery results.

  12. Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models.

    Science.gov (United States)

    Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi

    2017-03-01

    Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. A logistic regression estimating function for spatial Gibbs point processes

    DEFF Research Database (Denmark)

    Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege

    We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...

  14. Spatial correlation in Bayesian logistic regression with misclassification

    DEFF Research Database (Denmark)

    Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose

    2014-01-01

    Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...

  15. Diagnostic profiles of acute abdominal pain with multinomial logistic regression

    Directory of Open Access Journals (Sweden)

    Ohmann, Christian

    2007-07-01

    Full Text Available Purpose: Application of multinomial logistic regression for diagnostic support of acute abdominal pain, a diagnostic problem with many differential diagnoses. Methods: The analysis is based on a prospective data base with 2280 patients with acute abdominal pain, characterized by 87 variables from history and clinical examination and 12 differential diagnoses. Associations between single variables from history and clinical examination and the final diagnoses were investigated with multinomial logistic regression. Results: Exemplarily, the results are presented for the variable rigidity. A statistical significant association was observed for generalized rigidity and the diagnoses appendicitis, bowel obstruction, pancreatitis, perforated ulcer, multiple and other diagnoses and for localized rigidity and appendicitis, diverticulitis, biliary disease and perforated ulcer. Diagnostic profiles were generated by summarizing the statistical significant associations. As an example the diagnostic profile of acute appendicitis is presented. Conclusions: Compared to alternative approaches (e.g. independent Bayes, loglinear model there are advantages for multinomial logistic regression to support complex differential diagnostic problems, provided potential traps are avoided (e.g. α-error, interpretation of odds ratio.

  16. Forest cover dynamics analysis and prediction modelling using logistic regression model (case study: forest cover at Indragiri Hulu Regency, Riau Province)

    Science.gov (United States)

    Nahib, Irmadi; Suryanta, Jaka

    2017-01-01

    Forest destruction, climate change and global warming could reduce an indirect forest benefit because forest is the largest carbon sink and it plays a very important role in global carbon cycle. To support Reducing Emissions from Deforestation and Forest Degradation (REDD +) program, people pay attention of forest cover changes as the basis for calculating carbon stock changes. This study try to explore the forest cover dynamics as well as the prediction model of forest cover in Indragiri Hulu Regency, Riau Province Indonesia. The study aims to analyse some various explanatory variables associated with forest conversion processes and predict forest cover change using logistic regression model (LRM). The main data used in this study is Land use/cover map (1990 – 2011). Performance of developed model was assessed through a comparison of the predicted model of forest cover change and the actual forest cover in 2011. The analysis result showed that forest cover has decreased continuously between 1990 and 2011, up to the loss of 165,284.82 ha (35.19 %) of forest area. The LRM successfully predicted the forest cover for the period 2010 with reasonably high accuracy (ROC = 92.97 % and 70.26 %).

  17. MENENTUKAN PROBABILITAS QUALITAS LULUSAN PROGRAM STUDI MENGGUNAKAN LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Maxsi Ary

    2016-03-01

    Full Text Available Abstract – Human resources (HR is one of the success factors in the economic field, namely how to create a human resources (HR qualified and have the skills and highly competitive in the global competition. Educational level of the labor force that is still relatively low. The structure of education of the workforce is still dominated Indonesian basic education which is about 63.2%. The issue raised is to determine the probability of a program of study (whether or not to see some of the ratio of the number of graduates by the number of students per class, the amount of quota size class (large or small using logistic regression models. Data were obtained from a search result based on the amount of data the study program students and graduates in 2010 Data processing using SPSS. The results of the analysis by assessing model fit and the results will be given for each model fit. Starting with the hypothesis for assessing model fit, statistical -2LogL, Cox and Snell's R Square, Hosmer and Lemeshow's Goodness of Fit Test, and the classification table. The results of the analysis using SPSS as a tool aimed at measuring quality of graduate courses at a university, college, or academy, whether or not based on the ratio of the number of graduates and class quotas. Keywords: Quota Class, Probability, Logistic Regression Abstrak – Sumberdaya manusia (SDM adalah salah satu faktor kesuksesan dalam bidang ekonomi, yaitu bagaimana menciptakan sumber daya manusia (SDM yang berkualitas dan memiliki keterampilan serta berdaya saing tinggi dalam persaingan global. Tingkat pendidikan angkatan kerja yang ada masih relatif rendah. Struktur pendidikan angkatan kerja Indonesia masih didominasi pendidikan dasar yaitu sekitar 63,2%. Persoalan yang dikemukakan adalah menentukan probabilitas sebuah program studi (baik atau tidak dengan melihat beberapa rasio jumlah lulusan dengan jumlah mahasiswa per angkatan, ukuran besarnya kuota kelas (besar atau kecil menggunakan

  18. Robust Logistic Regression to Static Geometric Representation of Ratios

    Directory of Open Access Journals (Sweden)

    Alireza Bahiraie

    2009-01-01

    Full Text Available Problem statement: Some methodological problems concerning financial ratios such as non-proportionality, non-asymetricity, non-salacity were solved in this study and we presented a complementary technique for empirical analysis of financial ratios and bankruptcy risk. This new method would be a general methodological guideline associated with financial data and bankruptcy risk. Approach: We proposed the use of a new measure of risk, the Share Risk (SR measure. We provided evidence of the extent to which changes in values of this index are associated with changes in each axis values and how this may alter our economic interpretation of changes in the patterns and directions. Our simple methodology provided a geometric illustration of the new proposed risk measure and transformation behavior. This study also employed Robust logit method, which extends the logit model by considering outlier. Results: Results showed new SR method obtained better numerical results in compare to common ratios approach. With respect to accuracy results, Logistic and Robust Logistic Regression Analysis illustrated that this new transformation (SR produced more accurate prediction statistically and can be used as an alternative for common ratios. Additionally, robust logit model outperforms logit model in both approaches and was substantially superior to the logit method in predictions to assess sample forecast performances and regressions. Conclusion/Recommendations: This study presented a new perspective on the study of firm financial statement and bankruptcy. In this study, a new dimension to risk measurement and data representation with the advent of the Share Risk method (SR was proposed. With respect to forecast results, robust loigt method was substantially superior to the logit method. It was strongly suggested the use of SR methodology for ratio analysis, which provided a conceptual and complimentary methodological solution to many problems associated with the

  19. Using occupancy modeling and logistic regression to assess the distribution of shrimp species in lowland streams, Costa Rica: Does regional groundwater create favorable habitat?

    Science.gov (United States)

    Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.

    2016-01-01

    Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.

  20. Comparison of Artificial Neural Network with Logistic Regression as Classification Models for Variable Selection for Prediction of Breast Cancer Patient Outcomes

    Directory of Open Access Journals (Sweden)

    Valérie Bourdès

    2010-01-01

    Full Text Available The aim of this study was to compare multilayer perceptron neural networks (NNs with standard logistic regression (LR to identify key covariates impacting on mortality from cancer causes, disease-free survival (DFS, and disease recurrence using Area Under Receiver-Operating Characteristics (AUROC in breast cancer patients. From 1996 to 2004, 2,535 patients diagnosed with primary breast cancer entered into the study at a single French centre, where they received standard treatment. For specific mortality as well as DFS analysis, the ROC curves were greater with the NN models compared to LR model with better sensitivity and specificity. Four predictive factors were retained by both approaches for mortality: clinical size stage, Scarff Bloom Richardson grade, number of invaded nodes, and progesterone receptor. The results enhanced the relevance of the use of NN models in predictive analysis in oncology, which appeared to be more accurate in prediction in this French breast cancer cohort.

  1. The comparison of landslide ratio-based and general logistic regression landslide susceptibility models in the Chishan watershed after 2009 Typhoon Morakot

    Science.gov (United States)

    WU, Chunhung

    2015-04-01

    The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2

  2. Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models.

    Science.gov (United States)

    Alexandrescu, Roxana; Bottle, Alex; Jarman, Brian; Aylin, Paul

    2014-05-01

    The use of hierarchical logistic regression for provider profiling has been recommended due to the clustering of patients within hospitals, but has some associated difficulties. We assess changes in hospital outlier status based on standard logistic versus hierarchical logistic modelling of mortality. The study population consisted of all patients admitted to acute, non-specialist hospitals in England between 2007 and 2011 with a primary diagnosis of acute myocardial infarction, acute cerebrovascular disease or fracture of neck of femur or a primary procedure of coronary artery bypass graft or repair of abdominal aortic aneurysm. We compared standardised mortality ratios (SMRs) from non-hierarchical models with SMRs from hierarchical models, without and with shrinkage estimates of the predicted probabilities (Model 1 and Model 2). The SMRs from standard logistic and hierarchical models were highly statistically significantly correlated (r > 0.91, p = 0.01). More outliers were recorded in the standard logistic regression than hierarchical modelling only when using shrinkage estimates (Model 2): 21 hospitals (out of a cumulative number of 565 pairs of hospitals under study) changed from a low outlier and 8 hospitals changed from a high outlier based on the logistic regression to a not-an-outlier based on shrinkage estimates. Both standard logistic and hierarchical modelling have identified nearly the same hospitals as mortality outliers. The choice of methodological approach should, however, also consider whether the modelling aim is judgment or improvement, as shrinkage may be more appropriate for the former than the latter.

  3. Study on the interaction under logistic regression modeling%logistic回归模型中交互作用的分析及评价

    Institute of Scientific and Technical Information of China (English)

    邱宏; 余德新; 王晓蓉; 付振明; 谢立亚

    2008-01-01

    流行病学病因学研究常运用logistic回归模型分析影响因素的作用,并利用纳入乘积项的方法分析因素间交互作用,如有统计学意义表示两因素间存在相乘交互作用,但乘积项若无统计学意义并不表示两因素问相加交互作用或生物学交互作用的有无.文中介绍Rothman提出的针对logistic或Cox回归模型的三个评价相加交互作用的指标及其可信区间的计算,并以SPSS 15.0软件应用实例分析得出logistic回归模型的参数估计值和协方差矩阵,引入Andersson等编制的Excel计算表,计算相加交瓦作用指标及其可信区间,用于评价因素间的相加交互作用,为研究人员分析生物学交互作用提供依据.该方法方便快捷,且Excel计算表可在线免费下载.%When study on epidemiological causation is carried out,logistic regression has been commonly used to estimate the independent effects of risk factors.as well as to examine possible interactions among individual risk factor by adding one or more product terms to the regression model.In logistic or Cox's regression model.the regression coefficient of the product term estimates the interaction on a muhiplicative scale while statistical significance indicates the departure from multiplicativity.Rothman argues that when biologic interaction iS examined,we need to focus on interaction as departure from additivity rather than departure from multiplicativity.He presents three indices to measure interaction on an additive scale or departure from additivity.using logarithmic models such aS logistic or Cox's regression model.In this paper,we use data from a case-control study of female lung cancer in Hong Kong to calculate the regression coefficients and covariance matrix of logistie model in SPSS.We then introduce an Excel spreadsheet set up by Tomas Andersson to calculate the indices of interaction on an additive scale and the corresponding confidence intervals.The results can be used as

  4. An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy

    DEFF Research Database (Denmark)

    Merlo, Juan; Wagner, Philippe; Ghith, Nermin

    2016-01-01

    BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...

  5. Comparison of four prognostic models and a new Logistic regression model to predict short-term prognosis of acute-on-chronic hepatitis B liver failure

    Institute of Scientific and Technical Information of China (English)

    HE Wei-ping; HU Jin-hua; ZHAO Jun; TONG Jing-jing; DING Jin-biao; LIN Fang; WANG Hui-fen

    2012-01-01

    Background Acute-on-chronic hepatitis B liver failure (ACLF-HBV) is a clinically severe disease associated with major life-threatening complications including hepatic encephalopathy and hepatorenal syndrome.The aim of this study was to evaluate the short-term prognostic predictability of the model for end-stage liver disease (MELD),MELD-based indices,and their dynamic changes in patients with ACLF-HBV,and to establish a new model for predicting the prognosis of ACLF-HBV.Methods A total of 172 patients with ACLF-HBV who stayed in the hospital for more than 2 weeks were retrospectively recruited.The predictive accuracy of MELD,MELD-based indices,and their dynamic change (△) were compared using the area under the receiver operating characteristic curve method.The associations between mortality and patient characteristics were studied by univariate and multivariate analyses.Results The 3-month mortality was 43.6%.The largest concordance (c) statistic predicting 3-month mortality was the MELD score at the end of 2 weeks of admission (0.8),followed by the MELD:sodium ratio (MESO) (0.796) and integrated MELD (iMELD) (0.758) scores,△MELD (0.752),△MESO (0.729),and MELD plus sodium (MELD-Na) (0.728) scores.In multivariate Logistic regression analysis,the independent factors predicting prognosis were hepatic encephalopathy (OR=-3.466),serum creatinine,international normalized ratio (INR),and total bilirubin at the end of 2 weeks of admission (OR=10.302,6.063,5.208,respectively),and cholinesterase on admission (OR=0.255).This regression model had a greater prognostic value (c=0.85,95% Cl 0.791-0.909) compared to the MELD score at the end of 2 weeks of admission (Z=4.9851,P=-0.0256).Conclusions MELD score at the end of 2 weeks of admission is a useful predictor for 3-month mortality in ACLF-HBV patients.Hepatic encephalopathy,serum creatinine,international normalized ratio,and total bilirubin at the end of 2 weeks of admission and cholinesterase on admission are

  6. Logistic regression in estimates of femoral neck fracture by fall

    Directory of Open Access Journals (Sweden)

    Jaroslava Wendlová

    2010-04-01

    Full Text Available Jaroslava WendlováDerer’s University Hospital and Policlinic, Osteological Unit, Bratislava, SlovakiaAbstract: The latest methods in estimating the probability (absolute risk of osteoporotic fractures include several logistic regression models, based on qualitative risk factors plus bone mineral density (BMD, and the probability estimate of fracture in the future. The Slovak logistic regression model, in contrast to other models, is created from quantitative variables of the proximal femur (in International System of Units and estimates the probability of fracture by fall.Objectives: The first objective of this study was to order selected independent variables according to the intensity of their influence (statistical significance upon the occurrence of values of the dependent variable: femur strength index (FSI. The second objective was to determine, using logistic regression, whether the odds of FSI acquiring a pathological value (femoral neck fracture by fall increased or declined if the value of the variables (T–score total hip, BMI, alpha angle, theta angle and HAL were raised by one unit.Patients and methods: Bone densitometer measurements using dual energy X–ray absorptiometry (DXA, (Prodigy, Primo, GE, USA of the left proximal femur were obtained from 3 216 East Slovak women with primary or secondary osteoporosis or osteopenia, aged 20–89 years (mean age 58.9; 95% CI: −58.42; 59.38. The following variables were measured: FSI, T-score total hip BMD, body mass index (BMI, as were the geometrical variables of proximal femur alpha angle (α angle, theta angle (θ angle, and hip axis length (HAL.Statistical analysis: Logistic regression was used to measure the influence of the independent variables (T-score total hip, alpha angle, theta angle, HAL, BMI upon the dependent variable (FSI.Results: The order of independent variables according to the intensity of their influence (greatest to least upon the occurrence of values of the

  7. Application of GIS and logistic regression to fossil pollen data in modelling present and past spatial distribution of the Colombian savanna

    Energy Technology Data Exchange (ETDEWEB)

    Flantua, Suzette G.A.; Boxel, John H. van; Hooghiemstra, Henry; Smaalen, John van [University of Amsterdam, Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, Amsterdam (Netherlands)

    2007-12-15

    Climate changes affect the abundance, geographic extent, and floral composition of vegetation, which are reflected in the pollen rain. Sediment cores taken from lakes and peat bogs can be analysed for their pollen content. The fossil pollen records provide information on the temporal changes in climate and palaeo-environments. Although the complexity of the variables influencing vegetation distribution requires a multi-dimensional approach, only a few research projects have used GIS to analyse pollen data. This paper presents a new approach to palynological data analysis by combining GIS and spatial modelling. Eastern Colombia was chosen as a study area owing to the migration of the forest-savanna boundary since the last glacial maximum, and the availability of pollen records. Logistic regression has been used to identify the climatic variables that determine the distribution of savanna and forest in eastern Colombia. These variables were used to create a predictive land-cover model, which was subsequently implemented into a GIS to perform spatial analysis on the results. The palynological data from the study area were incorporated into the GIS. Reconstructed maps of past vegetation distribution by interpolation showed a new approach of regional multi-site data synthesis related to climatic parameters. The logistic regression model resulted in a map with 85.7% predictive accuracy, which is considered useful for the reconstruction of future and past land-cover distributions. The suitability of palynological GIS application depends on the number of pollen sites, the distribution of the pollen sites over the area of interest, and the degree of overlap of the age ranges of the pollen records. (orig.)

  8. Systematic Selection of Key Logistic Regression Variables for Risk Prediction Analyses: A Five-Factor Maximum Model.

    Science.gov (United States)

    Hewett, Timothy E; Webster, Kate E; Hurd, Wendy J

    2017-08-16

    The evolution of clinical practice and medical technology has yielded an increasing number of clinical measures and tests to assess a patient's progression and return to sport readiness after injury. The plethora of available tests may be burdensome to clinicians in the absence of evidence that demonstrates the utility of a given measurement. Thus, there is a critical need to identify a discrete number of metrics to capture during clinical assessment to effectively and concisely guide patient care. The data sources included Pubmed and PMC Pubmed Central articles on the topic. Therefore, we present a systematic approach to injury risk analyses and how this concept may be used in algorithms for risk analyses for primary anterior cruciate ligament (ACL) injury in healthy athletes and patients after ACL reconstruction. In this article, we present the five-factor maximum model, which states that in any predictive model, a maximum of 5 variables will contribute in a meaningful manner to any risk factor analysis. We demonstrate how this model already exists for prevention of primary ACL injury, how this model may guide development of the second ACL injury risk analysis, and how the five-factor maximum model may be applied across the injury spectrum for development of the injury risk analysis.

  9. A design of experiments approach to validation sampling for logistic regression modeling with error-prone medical records.

    Science.gov (United States)

    Ouyang, Liwen; Apley, Daniel W; Mehrotra, Sanjay

    2016-04-01

    Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches. The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed. The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample. The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low. © The Author 2015. Published by Oxford

  10. Cluster-localized sparse logistic regression for SNP data.

    Science.gov (United States)

    Binder, Harald; Müller, Tina; Schwender, Holger; Golka, Klaus; Steffens, Michael; Hengstler, Jan G; Ickstadt, Katja; Schumacher, Martin

    2012-08-14

    The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.

  11. Creating Cost Growth Models for the Engineering and Manufacturing Development Phase of Acquisition Using Logistic and Multiple Regression

    Science.gov (United States)

    2004-03-01

    Full Reduced Model 37.557442 23.333651 60.891093 -LogLikelihood 8 DF 75.11488 ChiSquare <.0001 Prob>ChiSq RSquare (U) Observations (or Sum...23.333651 -LogLikelihood 40.07563 ChiSquare 1.0000 Prob>ChiSq Lack Of Fit Intercept 77 LRIP Planned? 64 # Product variants in this SAR 24 Svs>3 14...0.9249799 0.0062721 Std Error 2.04 7.27 9.64 8.01 6.42 5.43 7.93 4.40 12.78 ChiSquare 0.1533 0.0070 0.0019 0.0047 0.0113 0.0198 0.0049

  12. What Are the Odds of that? A Primer on Understanding Logistic Regression

    Science.gov (United States)

    Huang, Francis L.; Moon, Tonya R.

    2013-01-01

    The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…

  13. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    Science.gov (United States)

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  14. What Are the Odds of that? A Primer on Understanding Logistic Regression

    Science.gov (United States)

    Huang, Francis L.; Moon, Tonya R.

    2013-01-01

    The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…

  15. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    Science.gov (United States)

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  16. Prediction of Foreign Object Debris/Damage type based in human factors for aeronautics using logistic regression model

    Science.gov (United States)

    Romo, David Ricardo

    Foreign Object Debris/Damage (FOD) has been an issue for military and commercial aircraft manufacturers since the early ages of aviation and aerospace. Currently, aerospace is growing rapidly and the chances of FOD presence are growing as well. One of the principal causes in manufacturing is the human error. The cost associated with human error in commercial and military aircrafts is approximately accountable for 4 billion dollars per year. This problem is currently addressed with prevention programs, elimination techniques, and designation of FOD areas, controlled access, restrictions of personal items entering designated areas, tool accountability, and the use of technology such as Radio Frequency Identification (RFID) tags, etc. All of the efforts mentioned before, have not show a significant occurrence reduction in terms of manufacturing processes. On the contrary, a repetitive path of occurrence is present, and the cost associated has not declined in a significant manner. In order to address the problem, this thesis proposes a new approach using statistical analysis. The effort of this thesis is to create a predictive model using historical categorical data from an aircraft manufacturer only focusing in human error causes. The use of contingency tables, natural logarithm of the odds and probability transformation is used in order to provide the predicted probabilities of each aircraft. A case of study is shown in this thesis in order to show the applied methodology. As a result, this approach is able to predict the possible outcomes of FOD by the workstation/area needed, and monthly predictions per workstation. This thesis is intended to be the starting point of statistical data analysis regarding FOD in human factors. The purpose of this thesis is to identify the areas where human error is the primary cause of FOD occurrence in order to design and implement accurate solutions. The advantages of the proposed methodology can go from the reduction of cost

  17. Interaction between continuous variables in logistic regression model%Logistic回归模型中连续变量交互作用的分析

    Institute of Scientific and Technical Information of China (English)

    邱宏; 余德新; 谢立亚; 王晓蓉; 付振明

    2010-01-01

    Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.%Rothman argued that interaction estimated as departure from additivity better reflected the biological interaction. In a logistic regression model, the product term reflects the interaction as departure from multiplicativity. So far, literature on estimating interaction regarding an additive scale using logistic regression was only focusing on two dichotomous factors. The objective of the present report was to provide a method to examine the interaction as departure from additivity between two continuous variables or between one continuous variable and one categorical variable.We used data from a lung cancer case-control study among males in Hong Kong as an example to illustrate the bootstrap re-sampling method for calculating the corresponding confidence intervals.Free software R (Version 2.8.1) was used to estimate interaction on the additive scale.

  18. Spatial prediction of Lactarius deliciosus and Lactarius salmonicolor mushroom distribution with logistic regression models in the Kızılcasu Planning Unit, Turkey.

    Science.gov (United States)

    Mumcu Kucuker, Derya; Baskent, Emin Zeki

    2015-01-01

    Integration of non-wood forest products (NWFPs) into forest management planning has become an increasingly important issue in forestry over the last decade. Among NWFPs, mushrooms are valued due to their medicinal, commercial, high nutritional and recreational importance. Commercial mushroom harvesting also provides important income to local dwellers and contributes to the economic value of regional forests. Sustainable management of these products at the regional scale requires information on their locations in diverse forest settings and the ability to predict and map their spatial distributions over the landscape. This study focuses on modeling the spatial distribution of commercially harvested Lactarius deliciosus and L. salmonicolor mushrooms in the Kızılcasu Forest Planning Unit, Turkey. The best models were developed based on topographic, climatic and stand characteristics, separately through logistic regression analysis using SPSS™. The best topographic model provided better classification success (69.3 %) than the best climatic (65.4 %) and stand (65 %) models. However, the overall best model, with 73 % overall classification success, used a mix of several variables. The best models were integrated into an Arc/Info GIS program to create spatial distribution maps of L. deliciosus and L. salmonicolor in the planning area. Our approach may be useful to predict the occurrence and distribution of other NWFPs and provide a valuable tool for designing silvicultural prescriptions and preparing multiple-use forest management plans.

  19. Logistic Regression-HSMM-Based Heart Sound Segmentation.

    Science.gov (United States)

    Springer, David B; Tarassenko, Lionel; Clifford, Gari D

    2016-04-01

    The identification of the exact positions of the first and second heart sounds within a phonocardiogram (PCG), or heart sound segmentation, is an essential step in the automatic analysis of heart sound recordings, allowing for the classification of pathological events. While threshold-based segmentation methods have shown modest success, probabilistic models, such as hidden Markov models, have recently been shown to surpass the capabilities of previous methods. Segmentation performance is further improved when a priori information about the expected duration of the states is incorporated into the model, such as in a hidden semi-Markov model (HSMM). This paper addresses the problem of the accurate segmentation of the first and second heart sound within noisy real-world PCG recordings using an HSMM, extended with the use of logistic regression for emission probability estimation. In addition, we implement a modified Viterbi algorithm for decoding the most likely sequence of states, and evaluated this method on a large dataset of 10,172 s of PCG recorded from 112 patients (including 12,181 first and 11,627 second heart sounds). The proposed method achieved an average F1 score of 95.63 ± 0.85%, while the current state of the art achieved 86.28 ± 1.55% when evaluated on unseen test recordings. The greater discrimination between states afforded using logistic regression as opposed to the previous Gaussian distribution-based emission probability estimation as well as the use of an extended Viterbi algorithm allows this method to significantly outperform the current state-of-the-art method based on a two-sided paired t-test.

  20. 基于 Logistic 回归的森林火险天气等级模型%Weather Model Level of Forest Fire Danger Based on Logistic Regression

    Institute of Scientific and Technical Information of China (English)

    张伟; 王峰; 郭艳芬; 郑煜

    2013-01-01

    根据大兴安岭地区林业局1975-2004年火灾资料及气象数据,利用logistic 回归选择最优配比建立了森林火险天气等级模型,并对其进行检验。经验证该模型具有较好的应用效果,能够为当地林业部门制定防火策略时提供参考。%With the fire records and meteorological data of the Daxing’an Mountain Area Forestry Bureauin Heilongjiang Prov-ince from 1975 to 2004 , a judgment method of forest fire danger weather level was established by the logistic regression with best ratios and examined by forest fire data. The model has a good application effect and can provide a reference for the local forestry department when formulating the fire prevention strategy.

  1. Examination By Multinomial Logistic Regression Model Of The Factors Affecting The Types Of Domestic Violence Against Women A Case Of Turkey

    Directory of Open Access Journals (Sweden)

    Erkan Ari

    2015-08-01

    Full Text Available In this paper factors affecting the types of domestic violence against women was determined by multinomial logistic regression model. In this context we used the data of Research on Domestic Violence against Women in Turkey that was applied by Turkish Statistamp305cal Institute in 2008. In the study the variable of the types of domestic violence against women was used as dependent variable that has four levels. In addition twelve independent variables were used removing irrelevant variables from the data set via chi-square test of independence. After that the maximum likelihood estimates and the odds ratios of the variables of the model were obtained. Besides the validity of the model was tested by likelihood ratio test. At last comparisons were made for three categories depending on the odds ratio according to the selected reference category. In terms of odds ratios the variables of education level of woman and husbands work sector were statistically significant in only comparison one the variables of agnation with husband education level of husband frequency of seeing drunk husband and frequency of gambling of husband were statistically significant in both comparison one and three the variables of region deceived by husband common-law female for husband were statistically significant in all comparisons.

  2. Shallow landslide susceptibility model for the Oria river basin, Gipuzkoa province (North of Spain). Application of the logistic regression and comparison with previous studies.

    Science.gov (United States)

    Bornaetxea, Txomin; Antigüedad, Iñaki; Ormaetxea, Orbange

    2016-04-01

    In the Oria river basin (885 km2) shallow landslides are very frequent and they produce several roadblocks and damage in the infrastructure and properties, causing big economic loss every year. Considering that the zonification of the territory in different landslide susceptibility levels provides a useful tool for the territorial planning and natural risk management, this study has the objective of identifying the most prone landslide places applying an objective and reproducible methodology. To do so, a quantitative multivariate methodology, the logistic regression, has been used. Fieldwork landslide points and randomly selected stable points have been used along with Lithology, Land Use, Distance to the transport infrastructure, Altitude, Senoidal Slope and Normalized Difference Vegetation Index (NDVI) independent variables to carry out a landslide susceptibility map. The model has been validated by the prediction and success rate curves and their corresponding area under the curve (AUC). In addition, the result has been compared to those from two landslide susceptibility models, covering the study area previously applied in different scales, such as ELSUS1000 version 1 (2013) and Landslide Susceptibility Map of Gipuzkoa (2007). Validation results show an excellent prediction capacity of the proposed model (AUC 0,962), and comparisons highlight big differences with previous studies.

  3. Ordinal logistic regression models: application in quality of life studies Modelos de regressão logística ordinal: aplicação em estudo sobre qualidade de vida

    Directory of Open Access Journals (Sweden)

    Mery Natali Silva Abreu

    2008-01-01

    Full Text Available Quality of life has been increasingly emphasized in public health research in recent years. Typically, the results of quality of life are measured by means of ordinal scales. In these situations, specific statistical methods are necessary because procedures such as either dichotomization or misinformation on the distribution of the outcome variable may complicate the inferential process. Ordinal logistic regression models are appropriate in many of these situations. This article presents a review of the proportional odds model, partial proportional odds model, continuation ratio model, and stereotype model. The fit, statistical inference, and comparisons between models are illustrated with data from a study on quality of life in 273 patients with schizophrenia. All tested models showed good fit, but the proportional odds or partial proportional odds models proved to be the best choice due to the nature of the data and ease of interpretation of the results. Ordinal logistic models perform differently depending on categorization of outcome, adequacy in relation to assumptions, goodness-of-fit, and parsimony.O tema qualidade de vida tem ganhado ênfase nos últimos anos. Tipicamente os resultados da qualidade de vida são mensurados por meio de escalas ordinais. Procedimentos como dicotomizar a variável resposta e desconsiderar a ordenação geram perda de informação e podem ocasionar inferências incorretas. Para análise de dados ordinais, métodos estatísticos específicos são necessários, como modelos de regressão logística ordinal. A proposta deste trabalho é apresentar uma revisão dos modelos de chances proporcionais, de razão contínua, estereótipo e de chances proporcionais parciais. O ajuste, inferência estatística e comparação dos modelos são ilustrados com dados de um estudo sobre qualidade de vida realizado com 273 pacientes com esquizofrenia. Todos os modelos testados mostraram bom ajuste, mas o de chances

  4. 上市公司财务预警的正则化逻辑回归模型%Regularized Logistic Regression Model in Financial Early-warning System with Listed Companies

    Institute of Scientific and Technical Information of China (English)

    张恒; 秦宾; 许金凤

    2011-01-01

    LI norm penalized logistic regression model is proposed based on regularization technique of statistical theory, Logistic regression model and L2 regularized logistic regression model are established. Combining two years' financial data of Shanghai-Shenzhen stock ST companies and normal counterparts, simulating experiment is conducted to analyze the financial early-warning system of listed companies. The results demonstrate the good performance and predicting accuracy of LI logistic regression model%基于统计学习理论的正则化技术构建L1(一范数约束惩罚)正则化的逻辑回归(Logistic Regression)模型,同比建立了logistic回归模型和L2(二范数约束惩罚)正则化的logistic回归模型,结合沪深股市ST公司和正常公司的T-3年和T-2年财务数据进行仿真实验用于上市公司财务预警实证分析.实验结果表明L1正则化的logistic回归模型的有效性,并且在保证模型预测精度的同时提高模型了解释性.

  5. Geo-Information Logistical Modeling

    Directory of Open Access Journals (Sweden)

    Nikolaj I. Kovalenko

    2014-11-01

    Full Text Available This paper examines geo-information logistical modeling. The author illustrates the similarities between geo-informatics and logistics in the area of spatial objectives; illustrates that applying geo-data expands the potential of logistics; brings to light geo-information modeling as the basis of logistical modeling; describes the types of geo-information logistical modeling; describes situational geo-information modeling as a variety of geo-information logistical modeling.

  6. Dynamic Simulation of Urban Expansion Based on Cellular Automata and Logistic Regression Model: Case Study of the Hyrcanian Region of Iran

    Directory of Open Access Journals (Sweden)

    Meisam Jafari

    2016-08-01

    Full Text Available The hypothesis addressed in this article is to determine the extent of selected land use categories with respect to their effect on urban expansion. A model that combines a logistic regression model, Markov chain, together with cellular automata based modeling, is introduced here to simulate future urban growth and development in the Gilan Province, Iran. The model is calibrated based on data beginning in 1989 and ending in 2013 and is applied in making predictions for the years 2025 and 2037, across 12 urban development criteria. The relative operating characteristic (ROC is validated with a very high rate of urban development. The analyzed results indicate that the area of urban land has increased by more than 1.7% that is, from 36,012.5 ha in 1989 to 59,754.8 ha in 2013 and the area of the Caspian Hyrcanian forestland has reduced by 31,628 ha. The simulation results, with respect to prediction, indicate an alarming increase in the rate of urban development in the province by 2025 and 2037 that is, 0.82% and 1.3%, respectively. The development pattern is expected to be uneven and scattered, without following any particular direction. The development will occur close to the existing or newly-formed urban infrastructure and around major roads and commercial areas. If not controlled, this development trend will lead to the loss of 25,101 ha of Hyrcanian forest and, if continued, 21,774 ha of barren and open lands are expected to be destroyed by the year 2037. These results demonstrate the capacity of the integrated model in establishing comparisons with urban plans and their utility to explain both the volume and constraints of urban growth. It is beneficial to apply the integrated approach in urban dynamic assessment through land use modeling with respect to spatio-temporal representation in distinct urban development formats.

  7. Exploring Public Perception of Paratransit Service Using Binomial Logistic Regression

    Directory of Open Access Journals (Sweden)

    Hisashi Kubota

    2007-01-01

    Full Text Available Knowledge of the market is a requirement for a successful provision of public transportation. This study aims to explore public perception of paratransit service, as represented by the user and non-user of paratransit. The analysis has been conducted based on the public’s response, by creating several binomial logistic regression models using the public perception of the quality of service, quality of car, quality of driver, and fare. These models illustrate the characteristics and important variables to establish whether the public will use more paratransit in the future once improvements will have been made. Moreover, several models are developed to explore public perception in order to find out whether they agree to the replacement of paratransit with other types of transportation modes. All models are well fitting. These models are able to explain the respondents’ characteristics and to reveal their actual perception of the operation of paratransit. This study provides a useful tool to know the market in greater depth.

  8. Classification of endometrial lesions by nuclear morphometry features extracted from liquid-based cytology samples: a system based on logistic regression model.

    Science.gov (United States)

    Zygouris, Dimitrios; Pouliakis, Abraham; Margari, Niki; Chrelias, Charalampos; Terzakis, Emmanouil; Koureas, Nikolaos; Panayiotides, Ioannis; Karakitsos, Petros

    2014-08-01

    To investigate the potential of a computerized system for the discrimination of benign from malignant endometrial nuclei and lesions. A total of 228 histologically confirmed liquid-based cytological smears were collected: 117 within normal limits cases, 66 malignant cases, 37 hyperplasias without atypia, and 8 cases of hyperplasia with atypia. From each case we extracted nuclear morphometric features from about 100 nuclei using a custom image analysis system. Initially we performed feature selection, and subsequently we applied a logistic regression model that classified each nucleus as benign or malignant. Based on the results of the nucleus classification process, we constructed an algorithm to discriminate endometrium cases as benign or malignant. The proposed system had an overall accuracy for the classification of endometrial nuclei equal to 83.02%, specificity of 85.09%, and sensitivity of 77.01%. For the case classification the overall accuracy was 92.98%, specificity was 92.86%, and sensitivity was 93.24%. The proposed computerized system can be applied for the classification of endometrial nuclei and lesions as it outperformed the standard cytological diagnosis. This study highlights interesting diagnostic features of endometrial nuclear morphology, and the proposed method can be a useful tool in the everyday practice of the cytological laboratory.

  9. Determiners of enterprise risk management applications in Turkey: An empirical study with logistic regression model on the companies included in ISE (Istanbul Stock Exchange

    Directory of Open Access Journals (Sweden)

    Şerife Önder

    2012-10-01

    Full Text Available Enterprise risk management (ERM, which came along with the change in the understanding of risk management in companies, refers to evaluation of all the risks as a whole and managing them in line with the targets of the company. This study aims at determining the ERM application levels of the companies included in the Istanbul Stock Exchange and the factors that affect these applications. Existence of ERM in the companies was related with having senior manager in charge of risk management. In order to explain ERM applications with profitability, leverage and company size a Logistic Regression model was established. As a result of the analysis it was determined that about half of the financial sector companies within the ISE employed a chief risk officer (CRO, which means a culture of risk management has been founded within these companies. Moreover, it was determined that profitability of the companies do not have any significance in ERM applications while the most important factors that affect the applications were found to be leverage and company size.

  10. A Comparative Study of Cox Regression vs. Log-Logistic ...

    African Journals Online (AJOL)

    Journal of Medical and Biomedical Sciences ... using non-parametric Cox model and parametric Log-logistic model, factors influencing survival of ... colorectal cancer referred to Taleghani Medical and Training Center of Tehran between 2001 ...

  11. 基于 Logistic 回归模型的三线城市道路事故数据分析%Traffic Accident Data Analysis of Third-class Urban Roadways Using Logistic Regression Models

    Institute of Scientific and Technical Information of China (English)

    邓瑶望; 李凌宇; 陈雨人

    2014-01-01

    According to the statistical data of Urumqi City from 2006 to 2010 ,nine different crash types of traffic accidents on urban roadways were selected respectively as the dependent variables .Furthermore ,nine factors were select-ed as the independent variables ,in aspects of road facilities and road environment .Based on Binary Logistic Regression model ,this paper established linear correlative models between crash types and nine affecting factors ,evaluated the model parameters ,analyzed the reliability and fitting degree of the model ,and investigated the impact that different independent variables combination have on the dependent variables .The paper also predicted the risk of each crash types under various conditions by using a multi-Logistic model ,and compared the prediction with the actual cases ,and tested the fitting effi-ciency of the model used .%根据乌鲁木齐市2006~2010年的交通事故统计资料,分别以城市道路中9类不同的交通事故形态为因变量,从道路设施、道路环境等方面选取了9个因素作为自变量,通过二项logistic模型进行事故形态分析,建立事故形态与9个影响因素间的线性相关模型,对模型参数进行了估计,并对模型的拟合程度、可靠性进行了分析,研究了所有自变量单独/组合等不同情况下对因变量的影响。再通过多项Logistic模型对不同道路条件下,各种形态的事故发生几率进行了预测,并与实际情况进行对比,检验了模型拟合效果。

  12. Flexible survival regression modelling

    DEFF Research Database (Denmark)

    Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben

    2009-01-01

    Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...

  13. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh

    OpenAIRE

    Das Sumonkanti; Rahman Rajwanur M

    2011-01-01

    Abstract Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0...

  14. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework.

  15. A Remote Sensing Based Approach for the Assessment of Debris Flow Hazards Using Artificial Neural Network and Binary Logistic Regression Modeling

    Science.gov (United States)

    El Kadiri, R.; Sultan, M.; Elbayoumi, T.; Sefry, S.

    2013-12-01

    Efforts to map the distribution of debris flows, to assess the factors controlling their development, and to identify the areas prone to their development are often hampered by the absence or paucity of appropriate monitoring systems and historical databases and the inaccessibility of these areas in many parts of the world. We developed methodologies that heavily rely on readily available observations extracted from remote sensing datasets and successfully applied these techniques over the the Jazan province, in the Red Sea hills of Saudi Arabia. We first identified debris flows (10,334 locations) from high spatial resolution satellite datasets (e.g., GeoEye, Orbview), and verified a subset of these occurrences in the field. We then constructed a GIS to host the identified debris flow locations together with co-registered relevant data (e.g., lithology, elevation) and derived products (e.g., slope, normalized difference vegetation index, etc). Spatial analysis of the data sets in the GIS sets indicated various degrees of correspondence between the distribution of debris flows and various variables (e.g., stream power index, topographic position index, normalized difference vegetation index, distance to stream, flow accumulation, slope and soil weathering index, aspect, elevation) suggesting a causal effect. For example, debris flows were found in areas of high slope, low distance to low stream orders and low vegetation index. To evaluate the extent to which these factors control landslide distribution, we constructed and applied: (1) a stepwise input selection by testing all input combinations to make the final model more compact and effective, (2) a statistic-based binary logistic regression (BLR) model, and (3) a mathematical-based artificial neural network (ANN) model. Only 80% (8267 locations) of the data was used for the construction of each of the models and the remaining samples (2067 locations) were used for the accuracy assessment purposes. Results

  16. Actigraphy-based scratch detection using logistic regression.

    Science.gov (United States)

    Petersen, Johanna; Austin, Daniel; Sack, Robert; Hayes, Tamara L

    2013-03-01

    Incessant scratching as a result of diseases such as atopic dermatitis causes skin break down, poor sleep quality, and reduced quality of life for affected individuals. In order to develop more effective therapies, there is a need for objective measures to detect scratching. Wrist actigraphy, which detects wrist movements over time using micro-accelerometers, has shown great promise in detecting scratch because it is lightweight, usable in the home environment, can record longitudinally, and does not require any wires. However, current actigraphy-based scratch-detection methods are limited in their ability to discriminate scratch from other nighttime activities. Our previous work demonstrated the separability of scratch from both walking and restless sleep using a clustering technique which employed four features derived from the actigraphic data: number of accelerations above 0.01 gs, epoch variance, peak frequency, and autocorrelation value at one lag. In this paper, we extended these results by employing these same features as independent variables in a logistic regression model. This allows us to directly estimate the conditional probability of scratching for each epoch. Our approach outperforms competing actigraphy-based approaches and has both high sensitivity (0.96) and specificity (0.92) for identifying scratch as validated on experimental data collected from 12 healthy subjects. The model must still be fully validated on clinical data, but shows promise for applications to clinical trials and longitudinal studies of scratch.

  17. Parameter Estimation for Improving Association Indicators in Binary Logistic Regression

    Directory of Open Access Journals (Sweden)

    Mahdi Bashiri

    2012-02-01

    Full Text Available The aim of this paper is estimation of Binary logistic regression parameters for maximizing the log-likelihood function with improved association indicators. In this paper the parameter estimation steps have been explained and then measures of association have been introduced and their calculations have been analyzed. Moreover a new related indicators based on membership degree level have been expressed. Indeed association measures demonstrate the number of success responses occurred in front of failure in certain number of Bernoulli independent experiments. In parameter estimation, existing indicators values is not sensitive to the parameter values, whereas the proposed indicators are sensitive to the estimated parameters during the iterative procedure. Therefore, proposing a new association indicator of binary logistic regression with more sensitivity to the estimated parameters in maximizing the log- likelihood in iterative procedure is innovation of this study.

  18. Basic Diagnosis and Prediction of Persistent Contrail Occurrence using High-resolution Numerical Weather Analyses/Forecasts and Logistic Regression. Part II: Evaluation of Sample Models

    Science.gov (United States)

    Duda, David P.; Minnis, Patrick

    2009-01-01

    Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.

  19. Predictions of flood warning threshold exceedance computed with logistic regression

    Science.gov (United States)

    Diomede, Tommaso; Marsigli, Chiara; Stefania Tesini, Maria

    2017-04-01

    A method based on logistic regression is proposed for the prediction of river level threshold exceedance at different lead times (from +6h up to +42h). The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation ("state of the catchment"), (ii) the last 24 hours, which may be relevant for the current water level in the river ("state of the river"), and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs ("state of the atmosphere"). Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the current "state of the river" can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18-42 hours, or the previous 6-12h, according to the forecast lead time. The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over several catchments in the Emilia-Romagna Region, northern Italy, which dimensions range from 100 to 1000 km2. A statistical analysis in terms of false alarms, misses and related scores was carried out by using

  20. Saturated logistic avalanche model

    Science.gov (United States)

    Aielli, G.; Camarri, P.; Cardarelli, R.; Di Ciaccio, A.; Liberti, B.; Paoloni, A.; Santonico, R.

    2003-08-01

    The search for an adequate avalanche RPC working model evidenced that the simple exponential growth can describe the electron multiplication phenomena in the gas with acceptable accuracy until the external electric field is not perturbed by the growing avalanche. We present here a model in which the saturated growth induced by the space charge effects is explained in a natural way by a constant coefficient non-linear differential equation, the Logistic equation, which was originally introduced to describe the evolution of a biological population in a limited resources environment. The RPCs, due to the uniform and intense field, proved to be an ideal device to test experimentally the presented model.

  1. Using Historical Data and Quasi-Likelihood Logistic Regression Modeling to Test Spatial Patterns of Channel Response to Peak Flows in a Mountain Watershed

    Science.gov (United States)

    Faustini, J. M.; Jones, J. A.

    2001-12-01

    This study used an empirical modeling approach to explore landscape controls on spatial variations in reach-scale channel response to peak flows in a mountain watershed. We used historical cross-section surveys spanning 20 years at five sites on 2nd to 5th-order channels and stream gaging records spanning up to 50 years. We related the observed proportion of cross-sections at a site exhibiting detectable change between consecutive surveys to the recurrence interval of the largest peak flow during the corresponding period using a quasi-likelihood logistic regression model. Stream channel response was linearly related to flood size or return period through the logit function, but the shape of the response function varied according to basin size, bed material, and the presence or absence of large wood. At the watershed scale, we hypothesized that the spatial scale and frequency of channel adjustment should increase in the downstream direction as sediment supply increases relative to transport capacity, resulting in more transportable sediment in the channel and hence increased bed mobility. Consistent with this hypothesis, cross sections from the 4th and 5th-order main stem channels exhibit more frequent detectable changes than those at two steep third-order tributary sites. Peak flows able to mobilize bed material sufficiently to cause detectable changes in 50% of cross-section profiles had an estimated recurrence interval of 3 years for the 4th and 5th-order channels and 4 to 6 years for the 3rd-order sites. This difference increased for larger magnitude channel changes; peak flows with recurrence intervals of about 7 years produced changes in 90% of cross sections at the main stem sites, but flows able to produce the same level of response at tributary sites were three times less frequent. At finer scales, this trend of increasing bed mobility in the downstream direction is modified by variations in the degree of channel confinement by bedrock and landforms, the

  2. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    Science.gov (United States)

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  3. Comparison and validation of Logistic Regression and Analytic Hierarchy Process models of landslide susceptibility in monoclinic regions. A case study in Moldavian Plateau, N-E Romania

    Science.gov (United States)

    Ciprian Margarint, Mihai; Niculita, Mihai

    2014-05-01

    The regions with monoclinic geological structure are large portions of earth surface where the repetition of similar landform patterns is very distinguished, the scarps of cuestas being characterized by similar values of morphometrical variables. Landslides are associated with these scarps of cuestas and consequently, a very high value of landslide susceptibility can be reported on its surface. In these regions, landslide susceptibility mapping can be realized for the entire region, or for test areas, with accurate, reliable, and available datasets, concerning multi-temporal inventories and landslide predictors. Because of the similar geomorphologic and landslide distribution we think that if any relevance of using test areas for extrapolating susceptibility models is present, these areas should be targeted first. This study case try to establish the level of usability of landslide predictors influence, obtained for a 90 km2 sample located in the northern part of the Moldavian Plateau (N-E Romania), in other areas of the same physio-geographic region. In a first phase, landslide susceptibility assessment was carried out and validated using logistic regression (LR) approach, using a multiple landslide inventory. This inventory was created using ortorectified aerial images from 1978 and 2005, for each period being considered both old and active landslides. The modeling strategy was based on a distinctly inventory of depletion areas of all landslide, for 1978 phase, and on a number of 30 covariates extracted from topographical and aerial images (both from 1978 and 2005 periods). The geomorphometric variables were computed from a Digital Elevation Model (DEM) obtained by interpolation from 1:5000 contour data (2.5 m equidistance), at 10x10 m resolution. Distance from river network, distance from roads and land use were extracted from topographic maps and aerial images. By applying Akaike Information Criterion (AIC) the covariates with significance under 0.001 level

  4. Completing the Remedial Sequence and College-Level Credit-Bearing Math: Comparing Binary, Cumulative, and Continuation Ratio Logistic Regression Models

    Science.gov (United States)

    Davidson, J. Cody

    2016-01-01

    Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…

  5. Completing the Remedial Sequence and College-Level Credit-Bearing Math: Comparing Binary, Cumulative, and Continuation Ratio Logistic Regression Models

    Science.gov (United States)

    Davidson, J. Cody

    2016-01-01

    Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…

  6. The severity of Minamata disease declined in 25 years: temporal profile of the neurological findings analyzed by multiple logistic regression model.

    Science.gov (United States)

    Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji

    2005-01-01

    Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.

  7. Semi-Supervised Additive Logistic Regression: A Gradient Descent Solution

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper describes a semi-supervised regularized method for additive logistic regression. The graph regularization term of the combined functions is added to the original cost functional used in AdaBoost. This term constrains the learned function to be smooth on a graph. Then the gradient solution is computed with the advantage that the regularization parameter can be adaptively selected. Finally, the function step-size of each iteration can be computed using Newton-Raphson iteration. Experiments on benchmark data sets show that the algorithm gives better results than existing methods.

  8. Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.

    Science.gov (United States)

    Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai

    2017-04-01

    This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO2, SO2, O3 and PM2.5) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O3>PM2.5>NO2>humidity followed at a significant distance by the effects of SO2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space.

  9. Use of Logistic Regression for Forecasting Short-Term Volcanic Activity

    Directory of Open Access Journals (Sweden)

    Mark T. Woods

    2012-08-01

    Full Text Available An algorithm that forecasts volcanic activity using an event tree decision making framework and logistic regression has been developed, characterized, and validated. The suite of empirical models that drive the system were derived from a sparse and geographically diverse dataset comprised of source modeling results, volcano monitoring data, and historic information from analog volcanoes. Bootstrapping techniques were applied to the training dataset to allow for the estimation of robust logistic model coefficients. Probabilities generated from the logistic models increase with positive modeling results, escalating seismicity, and rising eruption frequency. Cross validation yielded a series of receiver operating characteristic curves with areas ranging between 0.78 and 0.81, indicating that the algorithm has good forecasting capabilities. Our results suggest that the logistic models are highly transportable and can compete with, and in some cases outperform, non-transportable empirical models trained with site specific information.

  10. Supply and demand analysis for flood insurance by using logistic regression model: case study at Citarum watershed in South Bandung, West Java, Indonesia

    Science.gov (United States)

    Sidi, P.; Mamat, M.; Sukono; Supian, S.

    2017-01-01

    Floods have always occurred in the Citarum river basin. The adverse effects caused by floods can cover all their property, including the destruction of houses. The impact due to damage to residential buildings is usually not small. Indeed, each of flooding, the government and several social organizations providing funds to repair the building. But the donations are given very limited, so it cannot cover the entire cost of repair was necessary. The presence of insurance products for property damage caused by the floods is considered very important. However, if its presence is also considered necessary by the public or not? In this paper, the factors that affect the supply and demand of insurance product for damaged building due to floods are analyzed. The method used in this analysis is the ordinal logistic regression. Based on the analysis that the factors that affect the supply and demand of insurance product for damaged building due to floods, it is included: age, economic circumstances, family situations, insurance motivations, and lifestyle. Simultaneously that the factors affecting supply and demand of insurance product for damaged building due to floods mounted to 65.7%.

  11. Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression

    Science.gov (United States)

    Elosua, Paula; Wells, Craig

    2013-01-01

    The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…

  12. Comparison of artificial neural networks with logistic regression for detection of obesity.

    Science.gov (United States)

    Heydari, Seyed Taghi; Ayatollahi, Seyed Mohammad Taghi; Zare, Najaf

    2012-08-01

    Obesity is a common problem in nutrition, both in the developed and developing countries. The aim of this study was to classify obesity by artificial neural networks and logistic regression. This cross-sectional study comprised of 414 healthy military personnel in southern Iran. All subjects completed questionnaires on their socio-economic status and their anthropometric measures were measured by a trained nurse. Classification of obesity was done by artificial neural networks and logistic regression. The mean age±SD of participants was 34.4 ± 7.5 years. A total of 187 (45.2%) were obese. In regard to logistic regression and neural networks the respective values were 80.2% and 81.2% when correctly classified, 80.2 and 79.7 for sensitivity and 81.9 and 83.7 for specificity; while the area under Receiver-Operating Characteristic (ROC) curve were 0.888 and 0.884 and the Kappa statistic were 0.600 and 0.629 for logistic regression and neural networks model respectively. We conclude that the neural networks and logistic regression both were good classifier for obesity detection but they were not significantly different in classification.

  13. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    Science.gov (United States)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  14. 基于 Logistic 回归的税务稽查选案模型研究%Research on the tax inspection selection scheme model based on the Logistic regression

    Institute of Scientific and Technical Information of China (English)

    王艳杰; 李清; 齐鑫鑫

    2012-01-01

      The traditional artificial Tax-checking sampling, with artificial factors, lack scientific sex and accuracy of malpractice.Establish inspection model, using computer automatic selection, can avoid the defect and improve work efficiency.Initially we select 9 financial indicators, through screening, eventually using 8 indices established a tax inspection Cases-Choice Logistic discriminant model, the sample back with total accuracy up to 79.6%.%  传统的人工税务稽查选案,具有人为因素大、缺乏科学性和准确度等弊端.建立稽查选案模型、采用计算机自动选案,可以避免上述弊端并提高工作效率.最初选取了9个财务指标,经过筛选,最终使用8个指标建立了税务稽查选案的 Logistic 判别模型,样本回带总准确率达79.6%.

  15. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  16. Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression

    Science.gov (United States)

    Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.

    2013-01-01

    Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…

  17. Macrobenthic species response surfaces along estuarine gradients: prediction by logistic regression

    NARCIS (Netherlands)

    Ysebaert, T.; Meire, P.; Herman, P.M.J.; Verbeek, H.

    2002-01-01

    This study aims at contributing to the development of statistical models to predict macrobenthic species response to environmental conditions in estuarine ecosystems. Ecological response surfaces are derived for 10 estuarine macrobenthic species. Logistic regression is applied on a large data set, p

  18. Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression

    Science.gov (United States)

    Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.

    2007-01-01

    Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…

  19. Multiple Logistic Regression Analysis of Cigarette Use among High School Students

    Science.gov (United States)

    Adwere-Boamah, Joseph

    2011-01-01

    A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…

  20. FINANCIAL EARLY-WARNING MODEL OF LISTED COMPANIES USING T-LOGISTIC REGRESSION%上市公司财务预警的T逻辑回归模型

    Institute of Scientific and Technical Information of China (English)

    徐征; 刘遵雄

    2015-01-01

    The classic logistic regression has the risk of over fitting. It can be solved by the regularization technique of the statistical learning theory. Optimization of convex loss function can ensure that the regularized risk minimization problem converges to the global optimum, but learning algorithm of convex loss function is susceptible to noise. Then T-logistic regression was proposed to amend, introducing T distribution into logistic regression. The non-convex loss function is made up for the deficiency of convex loss functions. Due to the non-convex loss function difficulty to solve, we will be logarithmic the objective function, and convex multiplicative programming is used to solver parameters. Through empirical study, it is found that T-logistic regression model has a good predictability and is tolerant to label noise.%针对经典的逻辑回归模型易受到样本类别噪声干扰的问题,采用T逻辑回归算法中的非凸损失函数以弥补这一不足。对T逻辑回归模型及求解算法进行了分析,建立T逻辑回归财务预警模型,并结合沪深上市公司财务数据开展实证分析,结果表明T逻辑回归模型具有较好的分类效果和鲁棒性。

  1. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    Science.gov (United States)

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  2. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

    Directory of Open Access Journals (Sweden)

    Maarten van Smeden

    2016-11-01

    Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

  3. [Evaluation of wall configuration ultrasonogrophicin diagnosis of thyroid small nodules using binary logistic regression].

    Science.gov (United States)

    Fu, Qiaomei; Wu, Pengxi; Ding, Yan

    2015-10-01

    To screen out the sonogram features for the differential diagnosis of benign and malignant thyroid small nodules (≤ 1.0 cm) by Logistics regression analysis, to establish the binary Logistic regression model of sonogram features as independent variable and investigate the value of wall configuration of ultrasonogrophic nodules in the differential diagnosis of benign and malignant thyroid small nodules. A total of 208 thyroid nodules ≤ 1.0 cm in diameter in 190 patients were evaluated. With postoperative pathological examination or fine needle aspiration biopsy, 106 nodules were confirmed as benign and 102 as malignant. Ultrasonic features of thyroid nodules were evaluated for the differential diagnosis of benign and malignant small thyroid nodules based on pathological diagnosis as a gold standard, a Logistic model was obtained, and the odds ratio of variables were compared. The margin of thyroid nodule was divided into regular or irregular margin, and the latter was divided further into four subtypes: strip, triangular, antler and papillary. The border was divided into clear, fuzzy or both. The periphery was divided into those with normal and abnormal echo;. The calcification included no calcification, microcalcification and non-microcalcification. Four statistically significant features were obtained finally by Logistics regression analysis, including margin, border, periphery and calcification. A formula was constructed by two-variables logistic regression analysis and probability of malignancy = 1/(1 + e - z), in which z = 5.026 × margin + 4.218 × border + 4.024 × periphery + 3.892 × calcification - 15.247. The odds ratio of margin was higher than the other independent variables. Logistics regression analysis indicates that the calcification, border, periphery, and especially margin of thyroid nodules are significant features for differentiating benign and malignant thyroid nodules. The margin score was more intuitionistic for the differentialtion of

  4. Using Logistic Regression to Identify Risk Factors Causing Rollover Collisions

    Directory of Open Access Journals (Sweden)

    Essam Dabbour

    2012-12-01

    Full Text Available Rollover collisions are among the most serious collisions that usually result in severe injuries or fatalities. In 2009, there were 8,732 fatal rollover collisions in the United States of America that resulted in the death of 9,833 persons. Those numbers represent approximately 28% and 29% of the total numbers of fatal collisions and fatalities, respectively. The main objective of this paper is to examine the impact of different risk factors that may contribute to this type of serious collisions to help develop countermeasures that limit them. To avoid the bias that may be caused by interactions among different drivers, this analysis focuses on rollover related to single-vehicle collisions so that the behavior of the driver of the collided vehicle can be analyzed more effectively. Logistic regression technique is utilized to analyze single-vehicle rollover collisions that occurred on state and interstate highways in the states of Ohio and Washington in 2009. The results obtained from this analysis have the potential to help decision makers identify different strategies to limit the severity of this type of collisions.

  5. Electronic Commerce Data Mining using Rough Set and Logistic Regression

    Directory of Open Access Journals (Sweden)

    Xiuli Li

    2014-05-01

    Full Text Available Electronic commerce (E-commerce has gradually been the mainstream of business. There may be some unpredictable but frequent problems such as delay in shipment, shipping errors caused by E-commerce participants’ low efficiency. There problems will have negative impact on the business of participants eventually. Correct evaluation of the efficiency of E-commerce is an important way to improve operations. This paper introduces the knowledge discovery theory of data mining-based on Rough Set Theory (RST to deal with the vague and inaccurate information about the evaluation of supplier and mine the law knowledge that exists between input variables and adverse position. The output of RST is then used as the feature and is delivered to the Logistic Regression (LR to rank the product of electronic commerce website. The proposed approach, termed as RST-LR, is composed of the procedure of attribute values discretization; filtration processing of minimum attributes sets; evaluation rule; calculating the ranking accuracy and the establishment of evaluation systems. We evaluated the proposed approach on a real world dataset, The experimental results show that it achievesa high accuracy, and the rule has met the requirements of application

  6. Modeling the potential distribution of shallow-seated landslides using the weights of evidence method and a logistic regression model:a case study of the Sabae Area, Japan

    Institute of Scientific and Technical Information of China (English)

    Ru-Hua SONG; Daimaru HIROMU; Abe KAZUTOKI; Kurokawa USIO; Matsuura SUMIO

    2008-01-01

    A number of statistical methods are typically used to effectively predict potential landslide distributions.In this study two multivariate statistical analysis methods were used (weights of evidence and logistic regression) to predict the potential distribution of shallow-seated landslides in the Kamikawachi area of Sabae City,Fukui Prefecture,Japan.First,the dependent variable (shallow-seated landslides) was divided into presence and absence,and the independent variables (environmental factors such as slope and altitude) were categorized according to their characteristics.Then,using the weights of evidence (WE) method,the weights of pairs comprising presence (w+(i) ) or absence (w-(i) ),and the contrast values for each category of independent variable (evidence),were calculated.Using the method that integrated the weights of evidence method and a logistic regression model,score values were calculated for each category of independent variable.Based on these contrast values,three models were selected to sum the score values of every gird in the study area.According to a receiver operating characteristic curve analysis (ROC),model 2 yielded the best fit for predicting the potential distribution of shallow-seated landslide hazards,with 89% correctness and a 54.5% hit ratio when the occurrence probability (OP) of landslides was 70%.The model was tested using data from an area close to the study region,and showed 94% correctness and a hit ratio of 45.7% when the OP of landslides was 70%.Finally,the potential distribution of shallow-seated landslides,based on the OP,was mapped using a geographical information system.

  7. Construction of Financial Crisis Predicting Model for Listed Companies Based on Logistic Regression%基于Logistic回归的上市公司财务预警模型构建

    Institute of Scientific and Technical Information of China (English)

    吴英

    2011-01-01

    The paper passes certain finance index sign and the finance index sign data to construct the predicting model for listed companies by logistic regression analysis.Through examination,the model has proved to be of actual application value.%通过一定的财务指标,采用我国上市公司的财务数据,基于Logistic回归方法构建上市公司财务危机预警的模型,经过检验,具有一定的实际应用价值。

  8. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh

    Directory of Open Access Journals (Sweden)

    Das Sumonkanti

    2011-11-01

    Full Text Available Abstract Background The study attempts to develop an ordinal logistic regression (OLR model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score child nutrition status is categorized into three groups-severely undernourished ( Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM are appropriate to find predictors of malnutrition instead of BLR models.

  9. TWO REGRESSION CREDIBILITY MODELS

    Directory of Open Access Journals (Sweden)

    Constanţa-Nicoleta BODEA

    2010-03-01

    Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.

  10. A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant

    Directory of Open Access Journals (Sweden)

    Baofeng Shi

    2015-01-01

    Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.

  11. 基于时空Logistic回归模型的漳州城市扩展预测分析%Urban Expansion Prediction for Zhangzhou City Based on GIS and Spatiotemporal Logistic Regression Model

    Institute of Scientific and Technical Information of China (English)

    杨云龙; 周小成; 吴波

    2011-01-01

    本文提出一种以时空Logistic回归模型来预测城市扩展的新方法。其首先在传统Logistic回归模型中加入空间自相关结构构建空间Logistic回归模型,然后,利用漳州市区近20年(1989-2009年)的数据,建立不同时期城市扩展模拟的多个子空间Logistic回归模型M1,再采用一次平滑指数法综合处理这些时间序列的Mi,构建出顾及空间复杂性和时间序复杂性的时空Logistic回归预测模型。新方法一方面克服了传统Logistic回归模型法受限于预测年份影响因素数据难以获取的缺点,另一方面由于模型考虑了城市扩展的长时间序列复杂性,即综合了城市扩展不同时期影响因素不同的情况,使它更接近城市扩展的实际,因而预测精度会提高。以福建省漳州市区为例,分别运用传统Logistic回归模型方法,在传统Logistic回归模型中单独加入空间自相关结构的空间Logistic回归模型法和基于时空Logistic回归模型的新方法这3种方法,对2009年城市扩展进行了预测分析。结果表明,基于时空Logistic模型的新方法比传统Logistic回归模型法和空间Logistic回归模型法的预测精度都要好,总体预测精度分别为81.02%、83.82%和87.00%,预测城市用地的精度从63.59%提高到67.35%和73.34%,ROC曲线下的面积AUC从0.826提高到0.883和0.924。%We start this study aimed at building a new method of spatiotemporal logistic regression model to predict urban expansion. This method first established a space Logistic regression model by adding autocorrelation structure based on the traditional logistic regression model, then built the multiple sub-space Logistic regression model Mi of urban growth simulation of different stages by Zhangzhou City's nearly 20 years (from 1989 to 2009) data. After this work, a spatiotemporal logistic regression model which took into account the spatial complexity and temporal

  12. Sample size determination for logistic regression on a logit-normal distribution.

    Science.gov (United States)

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  13. Nowcasting of Low-Visibility Procedure States with Ordered Logistic Regression at Vienna International Airport

    Science.gov (United States)

    Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim

    2017-04-01

    Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.

  14. Logistic Regression for Prediction and Diagnosis of Bacterial Regrowth in Water Distribution System

    Institute of Scientific and Technical Information of China (English)

    DONG Lihua; ZHAO Xinhua; WU Qing; YANG You'an

    2009-01-01

    This paper focuses on the quantitative expression of bacterial regrowth in water distribution system. Considering public health risks of bacterial regrowth, the experiment was performed on a distribution system of selected area. Physical, chemical, and microbiological parameters such as turbidity, temperature, residual chlorine and pH were measured over a three-month period and correlation analysis was carried out. Combined with principal components analysis(PCA), a logistic regression model is developed to predict and diagnose bacterial regrowth and locate the zones with high risks of microbiology in the distribution system. The model gives the probability of bacterial regrowth with the number of heterotrophic plate counts as the binary response variable and three new prin-cipal components variables as the explanatory variables. The veracity of the logistic regression model was 90%, which meets the precision requirement of the model.

  15. The effect of high leverage points on the logistic ridge regression estimator having multicollinearity

    Science.gov (United States)

    Ariffin, Syaiba Balqish; Midi, Habshah

    2014-06-01

    This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.

  16. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  17. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    Science.gov (United States)

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  18. Using probabilities of enterococci exceedance and logistic regression to evaluate long term weekly beach monitoring data.

    Science.gov (United States)

    Aranda, Diana; Lopez, Jose V; Solo-Gabriele, Helena M; Fleisher, Jay M

    2016-02-01

    Recreational water quality surveillance involves comparing bacterial levels to set threshold values to determine beach closure. Bacterial levels can be predicted through models which are traditionally based upon multiple linear regression. The objective of this study was to evaluate exceedance probabilities, as opposed to bacterial levels, as an alternate method to express beach risk. Data were incorporated into a logistic regression for the purpose of identifying environmental parameters most closely correlated with exceedance probabilities. The analysis was based on 7,422 historical sample data points from the years 2000-2010 for 15 South Florida beach sample sites. Probability analyses showed which beaches in the dataset were most susceptible to exceedances. No yearly trends were observed nor were any relationships apparent with monthly rainfall or hurricanes. Results from logistic regression analyses found that among the environmental parameters evaluated, tide was most closely associated with exceedances, with exceedances 2.475 times more likely to occur at high tide compared to low tide. The logistic regression methodology proved useful for predicting future exceedances at a beach location in terms of probability and modeling water quality environmental parameters with dependence on a binary response. This methodology can be used by beach managers for allocating resources when sampling more than one beach.

  19. Assessing the effects of different types of covariates for binary logistic regression

    Science.gov (United States)

    Hamid, Hamzah Abdul; Wah, Yap Bee; Xie, Xian-Jin; Rahman, Hezlin Aryani Abd

    2015-02-01

    It is well known that the type of data distribution in the independent variable(s) may affect many statistical procedures. This paper investigates and illustrates the effect of different types of covariates on the parameter estimation of a binary logistic regression model. A simulation study with different sample sizes and different types of covariates (uniform, normal, skewed) was carried out. Results showed that parameter estimation of binary logistic regression model is severely overestimated when sample size is less than 150 for covariate which have normal and uniform distribution while the parameter is underestimated when the distribution of covariate is skewed. Parameter estimation improves for all types of covariates when sample size is large, that is at least 500.

  20. Rock-profile correlations through logistic regression; Correlacao rocha-perfil atraves de regressao logistica

    Energy Technology Data Exchange (ETDEWEB)

    Castro, Wagner Barbosa de Mello

    1998-02-01

    Logistic regression models were generated starting from lithofacies described in cores and in well logs for two wells of Campos Basin. The main objective was verify the applicability of the technique in reservoir geology. The models were used to estimate the occurrence of reservoir facies in the wells. Results obtained were compared to the results of a previous discriminant analysis with the objective of determinate the accuracy of the two techniques as tools to estimate reservoir facies. Although discriminant analysis resulted more accurate in the estimate of reservoir facies, the use of logistic regression should not be discarded. Its independence of the normal distribution hypothesis make this technique, at least in theory, more robust than the discriminant analysis. (author)

  1. Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis

    Science.gov (United States)

    Johnson, William L.; Johnson, Annabel M.; Johnson, Jared

    2012-01-01

    Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…

  2. Using latent variables in logistic regression to reduce multicollinearity, A case-control example: breast cancer risk factors

    Directory of Open Access Journals (Sweden)

    Mohamad Amin Pourhoseingholi

    2008-03-01

    Full Text Available

    Background: Logistic regression is one of the most widely used models to analyze the relation between one or more explanatory variables and a categorical response in the field of epidemiology, health and medicine. When there is strong correlation among explanatory variables, i.e.multicollinearity, the efficiency of model reduces considerably. The objective of this research was to employ latent variables to reduce the effect of multicollinearity in analysis of a case-control study about breast cancer risk factors.

    Methods: The data belonged to a case-control study in which 300 women with breast cancer were compared to same number of controls. To assess the effect of multicollinearity, five highly correlated quantitative variables were selected. Ordinary logistic regression with collinear data was compared to two models contain latent variables were generated using either factor analysis or principal components analysis. Estimated standard errors of parameters were selected to compare the efficiency of models. We also conducted a simulation study in order to compare the efficiency of models with and without latent factors. All analyses were carried out using S-plus.

    Results: Logistic regression based on five primary variables showed an unusual odds ratios for age at first pregnancy (OR=67960, 95%CI: 10184-453503 and for total length of breast feeding (OR=0. On the other hand the parameters estimated for logistic regression on latent variables generated by both factor analysis and principal components analysis were statistically significant (P<0.003. Their standard errors were smaller than that of ordinary logistic regression on original variables. The simulation showed that in the case of normal error and 58% reliability the logistic regression based on latent variables is more efficient than that model for collinear variables.

    Conclusions: This research

  3. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression

    DEFF Research Database (Denmark)

    Larsen, Klaus; Merlo, Juan

    2005-01-01

    The logistic regression model is frequently used in epidemiologic studies, yielding odds ratio or relative risk interpretations. Inspired by the theory of linear normal models, the logistic regression model has been extended to allow for correlated responses by introducing random effects. However......, the model does not inherit the interpretational features of the normal model. In this paper, the authors argue that the existing measures are unsatisfactory (and some of them are even improper) when quantifying results from multilevel logistic regression analyses. The authors suggest a measure...... of heterogeneity, the median odds ratio, that quantifies cluster heterogeneity and facilitates a direct comparison between covariate effects and the magnitude of heterogeneity in terms of well-known odds ratios. Quantifying cluster-level covariates in a meaningful way is a challenge in multilevel logistic...

  4. Artificial neural network, genetic algorithm, and logistic regression applications for predicting renal colic in emergency settings.

    Science.gov (United States)

    Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay

    2009-06-03

    Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.

  5. Efficient methods for estimating constrained parameters with applications to lasso logistic regression.

    Science.gov (United States)

    Tian, Guo-Liang; Tang, Man-Lai; Fang, Hong-Bin; Tan, Ming

    2008-03-15

    Fitting logistic regression models is challenging when their parameters are restricted. In this article, we first develop a quadratic lower-bound (QLB) algorithm for optimization with box or linear inequality constraints and derive the fastest QLB algorithm corresponding to the smallest global majorization matrix. The proposed QLB algorithm is particularly suited to problems to which EM-type algorithms are not applicable (e.g., logistic, multinomial logistic, and Cox's proportional hazards models) while it retains the same EM ascent property and thus assures the monotonic convergence. Secondly, we generalize the QLB algorithm to penalized problems in which the penalty functions may not be totally differentiable. The proposed method thus provides an alternative algorithm for estimation in lasso logistic regression, where the convergence of the existing lasso algorithm is not generally ensured. Finally, by relaxing the ascent requirement, convergence speed can be further accelerated. We introduce a pseudo-Newton method that retains the simplicity of the QLB algorithm and the fast convergence of the Newton method. Theoretical justification and numerical examples show that the pseudo-Newton method is up to 71 (in terms of CPU time) or 107 (in terms of number of iterations) times faster than the fastest QLB algorithm and thus makes bootstrap variance estimation feasible. Simulations and comparisons are performed and three real examples (Down syndrome data, kyphosis data, and colon microarray data) are analyzed to illustrate the proposed methods.

  6. [Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].

    Science.gov (United States)

    Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan

    2015-01-01

    To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.

  7. Bayesian logistic regression in detection of gene–steroid interaction for cancer at PDLIM5 locus

    Indian Academy of Sciences (India)

    KE-SHENG WANG; DANIEL OWUSU; YUE PAN; CHANGCHUN XIE

    2016-06-01

    The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer(P<0.05); especially, SNP rs6532496 revealed the strongest association with cancer $(P=6.84×10^{−3})$; while the next best signal was rs951613 $(P=7.46×10^{−3})$. Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene–steroid interaction effects (OR =2.18, 95% CI=1.31−3.63 with $P= 2.9×10^{−3}$ for rs6532496 and OR = 2.07, 95% CI =1.24 −3.45 with $P=5.43×10^{−3}$ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2−3.38 for rs6532496 and OR=2.14, 95% CI =1.14 −3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene–steroid interaction effects (P<0.05); whereas 13 SNPs showed gene–steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene–steroid interaction effect (OR= 2.49, 95% CI=1.5−4. 13 with $P=4.0×10^{−4}$ based on the classic logistic regression and OR= 2.59, 95% CI =1.4−3.97 from Bayesian logistic regression;respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene

  8. Empirical Analysis of Logistics Demand Forecasting of Hebei Based on Multi-linear Regression Model%基于多元线性回归模型的河北省物流需求预测实证分析

    Institute of Scientific and Technical Information of China (English)

    周晓娟; 景志英

    2013-01-01

    尝试运用多元线性回归模型对河北省物流需求进行预测分析.在借鉴前人研究成果的基础上,选取研究指标,并且根据统计对数据的严格要求,选取了1990-2009年河北省统计年鉴上的相关指标作为数据来源,并对数据进行了逐步回归,以消除多重共线性,最后得出回归模型,并对模型进行了相关检验,验证模型是适合进行预测的.最后提出基于货运量是物流需求预测的关键,从三个方面提出加快河北省物流产业发展的政策建议.%In this paper,we attempted to use the multi-linear regression model to forecast and analyze the logistics demand of Hebei.On the basis of previous studies,we selected the suitable research index and then,in accordance with the stringent statistical standards,selected the relevant index in the statistical yearbook of Hebei from 1990 to 2009 as the data source,performed stepwise regression on the data to eliminate multicollinearity,and then obtained the regression model which was proved to be suitable for the forecasting.At the end,we proposed the suggestion to speed up the development of the logistics industry in Hebei on three aspects.

  9. Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

    Directory of Open Access Journals (Sweden)

    Kritski Afrânio

    2006-02-01

    Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

  10. Position in formal structure, personal characteristics and choices of advisors in a law firm : A logistic regression model for dyadic network data

    NARCIS (Netherlands)

    Lazega, E; vanDuijn, M

    1997-01-01

    This paper presents a statistical model for the analysis of binary sociometric choice data, the p(2) model, which provides a flexible way for using explanatory variables to model network structure. It is applied to examine the influence of the formal structure of an organization on interactions amon

  11. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing

    Science.gov (United States)

    John Hogland; Nedret Billor; Nathaniel Anderson

    2013-01-01

    Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...

  12. glmnetLRC f/k/a lrc package: Logistic Regression Classification

    Energy Technology Data Exchange (ETDEWEB)

    2016-06-09

    Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold for binary classification.

  13. Comparing the importance of prognostic factors in Cox and logistic regression using SAS.

    Science.gov (United States)

    Heinze, Georg; Schemper, Michael

    2003-06-01

    Two SAS macro programs are presented that evaluate the relative importance of prognostic factors in the proportional hazards regression model and in the logistic regression model. The importance of a prognostic factor is quantified by the proportion of variation in the outcome attributable to this factor. For proportional hazards regression, the program %RELIMPCR uses the recently proposed measure V to calculate the proportion of explained variation (PEV). For the logistic model, the R(2) measure based on squared raw residuals is used by the program %RELIMPLR. Both programs are able to compute marginal and partial PEV, to compare PEVs of factors, of groups of factors, and even to compare PEVs of different models. The programs use a bootstrap resampling scheme to test differences of the PEVs of different factors. Confidence limits for P-values are provided. The programs further allow to base the computation of PEV on models with shrinked or bias-corrected parameter estimates. The SAS macros are freely available at www.akh-wien.ac.at/imc/biometrie/relimp

  14. Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh.

    Science.gov (United States)

    Das, Sumonkanti; Rahman, Rajwanur M

    2011-11-14

    The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption. All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models.

  15. A free-knot spline modeling framework for piecewise linear logistic regression in complex samples with body mass index and mortality as an example

    Directory of Open Access Journals (Sweden)

    Scott W. Keith

    2014-09-01

    Full Text Available This paper details the design, evaluation, and implementation of a framework for detecting and modeling nonlinearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI; kg/m2 and the complex multi-stage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic nonlinear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and nonlinearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC and Akaike’s Information Criterion (AIC, in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions.

  16. Assessing Credit Default using Logistic Regression and Multiple Discriminant Analysis: Empirical Evidence from Bosnia and Herzegovina

    Directory of Open Access Journals (Sweden)

    Deni Memić

    2015-01-01

    Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.

  17. Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic Regression and Its Application to Classification Problems

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2014-01-01

    Full Text Available We present the hierarchical interactive lasso penalized logistic regression using the coordinate descent algorithm based on the hierarchy theory and variables interactions. We define the interaction model based on the geometric algebra and hierarchical constraint conditions and then use the coordinate descent algorithm to solve for the coefficients of the hierarchical interactive lasso model. We provide the results of some experiments based on UCI datasets, Madelon datasets from NIPS2003, and daily activities of the elder. The experimental results show that the variable interactions and hierarchy contribute significantly to the classification. The hierarchical interactive lasso has the advantages of the lasso and interactive lasso.

  18. A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes.

    Science.gov (United States)

    Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed

    2008-12-01

    A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

  19. Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically weighted ordinal logistic regression model (GWOLR)

    Science.gov (United States)

    Saputro, Dewi Retno Sari; Widyaningsih, Purnami

    2017-08-01

    In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).

  20. 基于GIS和Logistic回归模型的兰州市滑坡灾害敏感性区划研究%Landslide Susceptibility Zoning Study in Lanzhou City based on GIS and Logistic Regression Model

    Institute of Scientific and Technical Information of China (English)

    方苗; 张金龙; 徐填

    2011-01-01

    针对兰州市脆弱的地质环境和频繁发生的滑坡灾害,采用Logistic回归模型,以ArcGIS和SPSS软件为工具,选取地层岩性、断层构造、坡度、地貌、植被覆盖度、7~9月平均降水、道路(公路、铁路)作为滑坡灾害影响因子。首先对每个影响因子分级并计算每个因子指标值,然后在Arc-Map中对影响因子图层进行叠加操作,最后在SPSS软件中运用Logistic回归方法,计算出每个影响因子的系数值并建立Logistic回归模型。根据Logistic回归模型在ArcMap中绘制兰州市滑坡灾害敏感性区划图,区划图和实际的滑坡分布情况基本吻合。模型的Kappa系数值和ROC曲线下面积值(AUC值)分别为0.623和0.709,两种方法的检验结果均表明模型模拟效果较好,能应用于兰州市滑坡灾害敏感性区划研究中。%For fragile geological environment and the frequent occurrence of landslides in Lanzhou city,in this paper,based on Logistic regression model and adopted ArcGIS and SPSS softwares,selected elevation,slope,average precipitation from July to September,vegetation cover,roads,railways,lithology,faults as impacted factors of Lanzhou landslide disaster.First of all,impacted factors were classificated and calculated index values of every class at the same time,and then overlapped all impacted factors in ArcMap of ArcGIS software.The end,use of Logistic multiple non-linear statistical methods to calculate the coefficient of each impacted factor in SPSS software,according to the statistical results,Logistic regression model was gained.Drawed the Lanzhou landslide susceptibility zoning map based on Logistic regression model in ArcMap,zoning map of landslides accord with the actual distribution of landslides.Kappa value and AUC value of the model were 0.623 and 0.709 respectively,the tested results of the two methods showed that the model is good and can be applied to the Lanzhou landslide susceptibility zoning studies.

  1. Allelic drop-out probabilities estimated by logistic regression

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria

    2012-01-01

    We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al. [7] and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic dro...

  2. GIS-based logistic regression method for landslide susceptibility mapping in regional scale

    Institute of Scientific and Technical Information of China (English)

    ZHU Lei; HUANG Jing-feng

    2006-01-01

    Landslide susceptibility map is one of the study fields portraying the spatial distribution of future slope failure susceptibility. This paper deals with past methods for producing landslide susceptibility map and divides these methods into 3 types.The logistic linear regression approach is further elaborated on by crosstabs method, which is used to analyze the relationship between the categorical or binary response variable and one or more continuous or categorical or binary explanatory variables derived from samples. It is an objective assignment of coefficients serving as weights of various factors under considerations while expert opinions make great difference in heuristic approaches. Different from deterministic approach, it is very applicable to regional scale. In this study, double logistic regression is applied in the study area. The entire study area is first analyzed. The logistic regression equation showed that elevation, proximity to road, river and residential area are main factors triggering landslide occurrence in this area. The prediction accuracy of the first landslide susceptibility map was showed to be 80%. Along the road and residential area, almost all areas are in high landslide susceptibility zone. Some non-landslide areas are incorrectly divided into high and medium landslide susceptibility zone. In order to improve the status, a second logistic regression was done in high landslide susceptibility zone using landslide cells and non-landslide sample cells in this area. In the second logistic regression analysis, only engineering and geological conditions are important in these areas and are entered in the new logistic regression equation indicating that only areas with unstable engineering and geological conditions are prone to landslide during large scale engineerirg activity. Taking these two logistic regression results into account yields a new landslide susceptibility map. Double logistic regression analysis improved the non

  3. Binary outcome variables and logistic regression models

    Institute of Scientific and Technical Information of China (English)

    Xinhua LIU

    2011-01-01

    Biomedical researchers often study binary variables that indicate whether or not a specific event,such as remission of depression symptoms,occurs during the study period.The indicator variable Y takes two values,usually coded as one if the event (remission) is present and zero if the event is not present(non-remission).Let p be the probability that the event occurs ( Y =1),then 1-p will be the probability that the event does not occur ( Y =0).

  4. Simultaneous confidence bands for log-logistic regression with applications in risk assessment.

    Science.gov (United States)

    Kerns, Lucy X

    2017-05-01

    In risk assessment, it is often desired to make inferences on the low dose levels at which a specific benchmark risk is attained. Applications of simultaneous hyperbolic confidence bands for low-dose risk estimation with quantal data under different dose-response models (multistage, Abbott-adjusted Weibull, and Abbott-adjusted log-logistic models) have appeared in the literature. The use of simultaneous three-segment bands under the multistage model has also been proposed recently. In this article, we present explicit formulas for constructing asymptotic one-sided simultaneous hyperbolic and three-segment bands for the simple log-logistic regression model. We use the simultaneous construction to estimate upper hyperbolic and three-segment confidence bands on extra risk and to obtain lower limits on the benchmark dose by inverting the upper bands on risk under the Abbott-adjusted log-logistic model. Monte Carlo simulations evaluate the characteristics of the simultaneous limits. An example is given to illustrate the use of the proposed methods and to compare the two types of simultaneous limits at very low dose levels. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Mining pharmacovigilance data using Bayesian logistic regression with James-Stein type shrinkage estimation.

    Science.gov (United States)

    An, Lihua; Fung, Karen Y; Krewski, Daniel

    2010-09-01

    Spontaneous adverse event reporting systems are widely used to identify adverse reactions to drugs following their introduction into the marketplace. In this article, a James-Stein type shrinkage estimation strategy was developed in a Bayesian logistic regression model to analyze pharmacovigilance data. This method is effective in detecting signals as it combines information and borrows strength across medically related adverse events. Computer simulation demonstrated that the shrinkage estimator is uniformly better than the maximum likelihood estimator in terms of mean squared error. This method was used to investigate the possible association of a series of diabetic drugs and the risk of cardiovascular events using data from the Canada Vigilance Online Database.

  6. Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

    Science.gov (United States)

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.

  7. Predicting students' success at pre-university studies using linear and logistic regressions

    Science.gov (United States)

    Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir

    2014-09-01

    The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.

  8. 基于Logistic回归模型的房地产上市公司财务危机预警研究%Financial Early-warning for Real Estate Listed Company Based on Logistic Regression Model

    Institute of Scientific and Technical Information of China (English)

    许珂; 卢海

    2012-01-01

    近年来房价不断上涨,我国政府采取了一系列紧缩政策来控制房价,对房地产上市公司产生了巨大影响,有部分公司因为资金链断裂而陷入困境。因此,对房地产上市公司的财务预警十分重要。文章采用非参数检验的方法,从反映企业各方面能力的20个财务指标中筛选出能显著区别ST和非ST上市公司的指标,建立Logistic回归模型,利用SPSS软件,对房地产上市公司的财务危机预警进行了研究。研究结果表明,基于Logistic回归模型的财务危机预警模型效果较好,具有良好的预测准确度。%In recent years,the housing prices keep rocketing,thus the Chinese government tried a series of means to stabilize the price,which had a tremendous impact on some companies,leading to their falling into trouble due to funding strand breaks.Therefore,the financial early warning is very important for real estate listed companies.The paper uses non-parameter test based on 20 financial ratios,in order to find financial ratios which can significantly indentify the different between ST and non-ST companies.And then it studies the financial early-warning for real estate listed company based on logistic regression model by using the SPSS software.The results show that the model based on logistic regression model is of good effect,and has good predictive accuracy.

  9. Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis

    Science.gov (United States)

    Chang, C. H.; Chan, H. C.; Chen, B. A.

    2016-12-01

    Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.

  10. Comparation of logistic regression methods and discrete choice model in the selection of habitats Comparação dos métodos regressão logística e modelo de escolha discreta na seleção de habitats

    Directory of Open Access Journals (Sweden)

    Sandra Vergara Cardozo

    2010-01-01

    Full Text Available Based on a review of most recent data analyses on resource selection by animals as well as on recent suggestions that indicate the lack of an unified statistical theory that shows how resource selection can be detected and measured, the authors suggest that the concept of resource selection function (RSF can be the base for the development of a theory. The revision of discrete choice models (DCM is suggested as an approximation to estimate the RSF when the choice of animal or groups of animals involves different sets of available resource units. The definition of RSF requires that the resource which is being studied consists of discrete units. The statistical method often used to estimate the RSF is the logistic regression but DCM can also be used. The theory of DCM has been well developed for the analysis of data sets involving choices of products by humans, but it can also be applicable to the choice of habitat by animals, with some modifications. The comparison of the logistic regression with the DCM for one choice is made because the coefficient estimates of the logistic regression model include an intercept, which are not presented by the DCM. The objective of this work was to compare the estimates of the RSF obtained by applying the logistic regression and the DCM to the data set on habitat selection of the spotted owl (Strix occidentalis in the north west of the United States.Baseado em revisão mais recente de análises de dados em seleção de recurso pelos animais e com as mais recentes sugestões, que indicam a falta de uma teoria estatística unificada que mostre como a seleção do recurso pode ser detectada e medida, os autores sugerem que o conceito da função da seleção do recurso (RSF pode ser a base do desenvolvimento da teoria. A revisão de modelos de escolha discreta (DCM é sugerida como uma aproximação para estimar a RSF quando a escolha do animal os grupos de animais envolvem diferentes conjuntos de unidades de

  11. Logistic and linear regression model documentation for statistical relations between continuous real-time and discrete water-quality constituents in the Kansas River, Kansas, July 2012 through June 2015

    Science.gov (United States)

    Foster, Guy M.; Graham, Jennifer L.

    2016-04-06

    The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes

  12. Logistic regression function for detection of suspicious performance during baseline evaluations using concussion vital signs.

    Science.gov (United States)

    Hill, Benjamin David; Womble, Melissa N; Rohling, Martin L

    2015-01-01

    This study utilized logistic regression to determine whether performance patterns on Concussion Vital Signs (CVS) could differentiate known groups with either genuine or feigned performance. For the embedded measure development group (n = 174), clinical patients and undergraduate students categorized as feigning obtained significantly lower scores on the overall test battery mean for the CVS, Shipley-2 composite score, and California Verbal Learning Test-Second Edition subtests than did genuinely performing individuals. The final full model of 3 predictor variables (Verbal Memory immediate hits, Verbal Memory immediate correct passes, and Stroop Test complex reaction time correct) was significant and correctly classified individuals in their known group 83% of the time (sensitivity = .65; specificity = .97) in a mixed sample of young-adult clinical cases and simulators. The CVS logistic regression function was applied to a separate undergraduate college group (n = 378) that was asked to perform genuinely and identified 5% as having possibly feigned performance indicating a low false-positive rate. The failure rate was 11% and 16% at baseline cognitive testing in samples of high school and college athletes, respectively. These findings have particular relevance given the increasing use of computerized test batteries for baseline cognitive testing and return-to-play decisions after concussion.

  13. Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood

    Directory of Open Access Journals (Sweden)

    Fang-Rong Yan

    2011-01-01

    Full Text Available The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO penalty, by using receiver operating characteristic (ROC with bayesian bootstrap estimating area under the curve (AUC diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis.

  14. Comparing the Discrete and Continuous Logistic Models

    Science.gov (United States)

    Gordon, Sheldon P.

    2008-01-01

    The solutions of the discrete logistic growth model based on a difference equation and the continuous logistic growth model based on a differential equation are compared and contrasted. The investigation is conducted using a dynamic interactive spreadsheet. (Contains 5 figures.)

  15. Comparing the Discrete and Continuous Logistic Models

    Science.gov (United States)

    Gordon, Sheldon P.

    2008-01-01

    The solutions of the discrete logistic growth model based on a difference equation and the continuous logistic growth model based on a differential equation are compared and contrasted. The investigation is conducted using a dynamic interactive spreadsheet. (Contains 5 figures.)

  16. A Theoretic Model of Transport Logistics Demand

    Directory of Open Access Journals (Sweden)

    Natalija Jolić

    2006-01-01

    Full Text Available Concerning transport logistics as relation between transportand integrated approaches to logistics, some transport and logisticsspecialists consider the tenn tautological. However,transport is one of the components of logistics, along with inventories,resources, warehousing, infonnation and goods handling.Transport logistics considers wider commercial and operationalframeworks within which the flow of goods is plannedand managed. The demand for transport logistics services canbe valorised as highly qualitative, differentiated and derived.While researching transport phenomenon the implementationof models is inevitable and demand models highly desirable. Asa contribution to transport modelling this paper improves decisionmaking and planning in the transport logistics field.

  17. Computational procedures for probing interactions in OLS and logistic regression: SPSS and SAS implementations.

    Science.gov (United States)

    Hayes, Andrew F; Matthes, Jörg

    2009-08-01

    Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction between the independent and moderator variables in a model of the outcome variable. When an interaction is found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a specific pattern of effects of the focal independent variable as a function of the moderator. This article describes the familiar pick-a-point approach and the much less familiar Johnson-Neyman technique for probing interactions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is also available for users who prefer a point-and-click user interface rather than command syntax.

  18. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  19. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  20. Risk factors for subclinical intramammary infection in dairy goats in two longitudinal field studies evaluated by Bayesian logistic regression

    DEFF Research Database (Denmark)

    Koop, Gerrit; Collar, Carol A.; Toft, Nils

    2013-01-01

    are imperfect tests, particularly lacking sensitivity, which leads to misclassification and thus to biased estimates of odds ratios in risk factor studies. The objective of this study was to evaluate risk factors for the true (latent) IMI status of major pathogens in dairy goats. We used Bayesian logistic......, caprine arthritis encephalitis-virus infection status, and kidding season), and uncontrollable risk factors (parity, lactation stage, milk yield, pregnancy status, and breed) were measured in the Dutch study, the Californian study or in both studies. Bayesian logistic regression models were constructed...... in which the true (but latent) infection status was linked to the joint test results, as functions of test sensitivity and specificity. The latent IMI status was the dependent variable in the logistic regression model with risk factors as independent variables and with random herd and goat effects...

  1. Predicting Risk of Prenatal Screening for Down Syndrome Using Logistic Regression Model%Logistic回归在建立产前筛查唐氏综合征风险估计模型中的应用

    Institute of Scientific and Technical Information of China (English)

    卓仁杰; 李莺; 张莉娜; 沈其君

    2016-01-01

    利用64772例12~20周筛查孕妇(正常孕妇64718例,唐氏患者54例)的孕妇血清学指标,采用Logistic回归,建立三联筛查方案和二联筛查方案的风险估计模型.采用2种方案,计算全部筛查孕妇的风险值,通过ROC曲线下面积、检出率、阳性率评价2种筛查方案的效果.结果表明:年龄、甲胎蛋白、游离人绒毛膜促性腺激素和游离雌三醇在Logistic回归模型中均有统计学意义,三联方案ROC曲线下面积大于二联方案,但无显著性差异(z统计量为1.382, P=0.1669);在同一截断值下,三联方案检出率高于二联方案,假阳性率低于二联方案,差异有统计学意义(t=-9.44, P<0.001).三联方案根据约登指数计算,最佳筛查截断值为1:400,假阳性率为4.3%.%In this study, a total of 64772 pregnant weman in gestational timeframe between 12 to 20 weeks are selected for analytical purposes, including 64718 normal pregnant weman and 54 suffering Down's syndrome. The maternal serum markers are converted to serum with multiple of median (MoM) adjusted by grouping the gestational age and weight. The MoM is adjusted by twin, smoking and diabetes in the Logistic regression model. Risk estimation is calculated for each pregnant woman using two different screening methods. Area under the curve of receiver operator characteristic (AUC), detection rate and false positive rate are calculated individually to assess the effect of these two different methods. Results show that age, alpha-fetoprotein (AFP), free human chorionic gonadotrophin (HCG) and unconjugated estriol (uE3) are significantly dominant in the Logistic regression model. The AUC of tripe serum screening is larger than that of double serum screening, but manifests no significant difference (z=1.382,P=0.166 9). With the same cutoff value, the detection rate of triple serum screening is higher than that of double serum screening, and the false positive rate of triple serum screening is lower than

  2. Ridge Regression for Interactive Models.

    Science.gov (United States)

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…

  3. Application of the Polytomous Logistic Regression Model to Study of Tax Compliance of High Income Personal%运用Logistic回归模型加强高收入者纳税遵从的实证研究

    Institute of Scientific and Technical Information of China (English)

    韩晓琴; 李哲

    2012-01-01

    Strengthening tax collection and promoting tax compliance has been very important for all tax administrations.The authors will use the polytomous Logistic Model to analyze by using the sample,which is from the data of tax return of individual with annual income of 120000 yuan and above in one city,Jiangsu Province.The authors reject the missing values to make "grade five categories of owed tax" the response variable;and the independent variables are taxpayer's age,the taxable income amount,taxes,sex,occupation categories,industry categories,so that it is possible to carry out the Logistic regression analysis.The authors hope that such study can be useful to the tax officers in terms of administration on high income earners with annual income of 120000 yuan and above as well as improving compliance.%加强高收入者的个人所得税征管、促进纳税遵从一直是税务部门的重要工作之一。利用多元有序因变量Logistic回归模型,以江苏某市2010年度年所得12万元以上个人所得税纳税申报的数据为样本,以"应补税额"等级的五分类有序变量为因变量,以纳税申报表中纳税人的年龄、应纳税所得额、应纳税额、性别、职业大类、行业大类6个影响因素为自变量,进行多元有序因变量的Logistic回归分析,研究结果可为税务系统加强年所得12万元以上高收入者个人所得税的申报与征管、促进纳税遵从提供新的思路。

  4. Container Logistic Transport Planning Model

    Directory of Open Access Journals (Sweden)

    Xin Zhang

    2013-05-01

    Full Text Available The study proposed a stochastic method of container logistic transport in order to solve the unreasonable transportation’s problem and overcome the traditional models’ two shortcomings. Container transport has rapidly developed into a modern means of transportation because of their significant advantages. With the development, it also exacerbated the flaws of transport in the original. One of the most important problems was that the invalid transport had not still reduced due to the congenital imbalances of transportation. Container transport exacerbated the invalid transport for the empty containers. To solve the problem, people made many efforts, but they did not make much progress. There had two theoretical flaws by analyzing the previous management methods in container transport. The first one was the default empty containers inevitability. The second one was that they did not overall consider how to solve the problem of empty containers allocation. In order to solve the unreasonable transportation’s problem and overcome the traditional models’ two shortcomings, the study re-built the container transport planning model-gravity model. It gave the general algorithm and has analyzed the final result of model.

  5. Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression

    Directory of Open Access Journals (Sweden)

    Land Walker H

    2011-01-01

    Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.

  6. 基于Logistic模型的我国房地产开发投资压力测试研究%Stress Test on Real Estate Development Investment Based on Logistic Regression Model

    Institute of Scientific and Technical Information of China (English)

    朱俊; 庄新田; 王倩蓉

    2012-01-01

    对我国房地产开发投资风险进行压力测试分析,考虑房地产开发投资的地域性差异,选取三个代表地区——上海、辽宁和贵州,首先分析影响房地产投资的风险因素,然后运用Logistic回归模型建立该地区的房地产投资风险评价模型,确定重要影响因素,设定“异常”情景,最后测试房地产业的抗压能力.研究发现:影响三个地区房地产投资的因素不同,且重要性不一致.房地产销售价格变动与上海房地产投资风险关系最为密切,居民消费价格指数波动对辽宁房地产投资影响最大,而房地产投资的波动直接影响贵州房地产业的健康运行.对这三个重要指标设定极端情景,三个地区的房地产投资风险出现了不同程度的提高,由此得出,在投资这三个地区或与其相似类型的地区时,要充分考虑实际的宏观经济情况,特别是显著指标的变动情况,使投资更加的合理和安全.%Through studies from domestic and foreign researches' results, the thesis points out that it is necessary and feasible to carry on stress test analysis on real estate development investment. So we start to study on it. In this research, we choose Shanghai, Liaoning and Guizhou as representations after thorough consideration of the regional difference in our country's real estate development investment; Firstly, we analyze the factors that influence the investment, and then establish Logistic regression model to evaluate real estate development investment's risk, from the Logistic regression model we find the most important factors, then design extreme macroeconomic scenarios, finally do the test to see the results. Results displayed that: "Different regions have different significant factors. The fluctuations of real estate price, Consumer Price Index and the amount of real estate investment have much stronger impact on Shanghai, Liaoning and Guizhou respectively than other factors) when we construct

  7. Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis

    Directory of Open Access Journals (Sweden)

    Amatria Mario

    2016-12-01

    Full Text Available Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7 with the 8-a-side format (F-8 in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07 and the Creation Sector-Opponent's Half (0.18 vs 0.16. The probability was the same (0.04 in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20, and these were also more likely to be successful in this format (0.28 vs 0.19.

  8. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  9. Prediction of Depression in Cancer Patients With Different Classification Criteria, Linear Discriminant Analysis versus Logistic Regression.

    Science.gov (United States)

    Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa

    2015-11-03

    Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.

  10. Predicting research use in a public health policy environment: results of a logistic regression analysis.

    Science.gov (United States)

    Zardo, Pauline; Collie, Alex

    2014-10-09

    Use of research evidence in public health policy decision-making is affected by a range of contextual factors operating at the individual, organisational and external levels. Context-specific research is needed to target and tailor research translation intervention design and implementation to ensure that factors affecting research in a specific context are addressed. Whilst such research is increasing, there remain relatively few studies that have quantitatively assessed the factors that predict research use in specific public health policy environments. A quantitative survey was designed and implemented within two public health policy agencies in the Australian state of Victoria. Binary logistic regression analyses were conducted on survey data provided by 372 participants. Univariate logistic regression analyses of 49 factors revealed 26 factors that significantly predicted research use independently. The 26 factors were then tested in a single model and five factors emerged as significant predictors of research over and above all other factors. The five key factors that significantly predicted research use were the following: relevance of research to day-to-day decision-making, skills for research use, internal prompts for use of research, intention to use research within the next 12 months and the agency for which the individual worked. These findings suggest that individual- and organisational-level factors are the critical factors to target in the design of interventions aiming to increase research use in this context. In particular, relevance of research and skills for research use would be necessary to target. The likelihood for research use increased 11- and 4-fold for those who rated highly on these factors. This study builds on previous research and contributes to the currently limited number of quantitative studies that examine use of research evidence in a large sample of public health policy and program decision-makers within a specific context. The

  11. 基于Logistic回归模型的北京市水库湿地演变驱动力分析%Driving Forces Analysis of Reservoir Wetland Evolution in Beijing Based on Logistic Regression Model

    Institute of Scientific and Technical Information of China (English)

    李洪; 宫兆宁; 赵文吉; 宫辉力

    2012-01-01

    The reservoir wetland of Beijing, constitutes one of the important eco-systems in Beijing. The driving factors index system of Beijing reservoir wetland landscape evolution in the study area was built in the two aspects of the natural environment and socio-economy. Natural driving factors include precipitation, temperature, entry water and groundwater depth; social economic driving factors include the resident population, urbanization rate and per capita GDP. Using TM images from 1984 to 2010 to extract reservoir wetland's spatial distribution information of Beijing, we analyzed the area of reservoir wetland change laws in nearly 30 years. The driving mechanism of reservoir wetland evolution in the study area was explored by the Logistic regression model in different periods. The results indicated that in different phases, the driving factors and their influence on reservoir wetland evolution had certain differences. During 1984-1998, the leading driving factors were annual average precipitation and entry water index with the contribution rate of Logistic regression being 5.78 and 3.50, respectively, which was mainly affected by natural environmental factors; from 1998 to 2004, the impact of human activities intensified and man-made reservoir wetland reduced, and the main driving factors were the number of residents, groundwater depth and urbanization rate with the contribution rate of Logistic regression 9.41, 9.18, and 7.77, respectively. During 2004-2010, reservoir wetland evolution was impacted by both natural and socio-economic factors, and the dominant driving factors were urbanization rate and precipitation with the contribution rate of 6.62 and 4.22, respectively.%水库湿地作为北京市面积最大的人工湿地,是北京市重要的生态基础设施.从自然环境和社会经济因素两个方面,构建北京市水库湿地演变的驱动因子指标体系,其中自然驱动因子包括年均降水量、气温、入境水量和地下水埋深;

  12. Influencing factor study of Henoch-Schonlein purpura in children based on the logistic regression model%基于logistic回归模型的儿童过敏性紫癜影响因素研究

    Institute of Scientific and Technical Information of China (English)

    谢彪; 曲思杨; 相静; 罗潇; 王文佶; 刘美娜

    2015-01-01

    Objective The aim of this study is to assess the associated factors of Henoch-Schonlein purpura( HSP) in Chinese children. This study has important public implications for developing strategies of preventing HSP. Methods In this hospital-based, case-control study, we recruited 353 HSP cases and 61 control participants during 2012 and 2015. We collected related information of 414 HSP in children and the control group through questionnaires. Student's t-test, Pearson Chi-square test and Wilcoxon test were used respectively to compare the difference between the case and control group; We applied logistic re-gression model to analyze the associated factors of HSP in children. Results Univariate analysis revealed that the differences in age, mother and father's level of education were significant between the case and control group. Univariate analysis also revealed that the differences in diet regularity, eating cold, fried and spicy food, meat, drinks, milk and dairy products, fruits and vege-tables between the two groups were statistically significant. Similarly, the differences in way of birth, breast-feeding, mixed feed-ing and mother contacting harmful substances before becoming pregnant between the two groups were statistically significant. After the adjustment of age, sex and mother's level of education, multivariate Logistic regression demonstrated that living in rural are-as, the enough sleep time, consumption frequencies of fruit, sweet and meat and mother's level of education were protective fac-tors for HSP in children. Multivariate Logistic regression also demonstrated that mixed feeding, feeding after chewing, illness during pregnancy, and consumption frequencies of cold, spicy food and nuts were the risky factors for HSP. Conclusion Suffi-cient sleep time, less cold and spicy food and plenty of meat and fruit can reduce the risk of HSP in children. Advocating breast-feeding, avoid feeding after chewing can also prevent children from HSP.%目的:分析

  13. Application of fused lasso logistic regression to the study of corpus callosum thickness in early Alzheimer's disease.

    Science.gov (United States)

    Lee, Sang H; Yu, Donghyeon; Bachman, Alvin H; Lim, Johan; Ardekani, Babak A

    2014-01-15

    We propose a fused lasso logistic regression to analyze callosal thickness profiles. The fused lasso regression imposes penalties on both the l1-norm of the model coefficients and their successive differences, and finds only a small number of non-zero coefficients which are locally constant. An iterative method of solving logistic regression with fused lasso regularization is proposed to make this a practical procedure. In this study we analyzed callosal thickness profiles sampled at 100 equal intervals between the rostrum and the splenium. The method was applied to corpora callosa of elderly normal controls (NCs) and patients with very mild or mild Alzheimer's disease (AD) from the Open Access Series of Imaging Studies (OASIS) database. We found specific locations in the genu and splenium of AD patients that are proportionally thinner than those of NCs. Callosal thickness in these regions combined with the Mini Mental State Examination scores differentiated AD from NC with 84% accuracy.

  14. Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

    Science.gov (United States)

    Atar, Burcu; Kamata, Akihito

    2011-01-01

    The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…

  15. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    Science.gov (United States)

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  16. Large Scale Identification and Categorization of Protein Sequences Using Structured Logistic Regression

    DEFF Research Database (Denmark)

    Pedersen, Bjørn Panella; Ifrim, Georgiana; Liboriussen, Poul

    2014-01-01

    Abstract Background Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well...

  17. A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

    Science.gov (United States)

    Paek, Insu

    2012-01-01

    Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…

  18. A note on Bayesian logistic regression for spatial exponential family Gibbs point processes

    OpenAIRE

    Rajala, Tuomas

    2014-01-01

    Recently, a very attractive logistic regression inference method for exponential family Gibbs spatial point processes was introduced. We combined it with the technique of quadratic tangential variational approximation and derived a new Bayesian technique for analysing spatial point patterns. The technique is described in detail, and demonstrated on numerical examples.

  19. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    Science.gov (United States)

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  20. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  1. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  2. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    Science.gov (United States)

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  3. Predictors of Placement Stability at the State Level: The Use of Logistic Regression to Inform Practice

    Science.gov (United States)

    Courtney, Jon R.; Prophet, Retta

    2011-01-01

    Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…

  4. Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

    Science.gov (United States)

    Atar, Burcu; Kamata, Akihito

    2011-01-01

    The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…

  5. A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

    Science.gov (United States)

    Paek, Insu

    2012-01-01

    Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…

  6. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    Science.gov (United States)

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  7. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    Science.gov (United States)

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  8. Inferential Models for Linear Regression

    Directory of Open Access Journals (Sweden)

    Zuoyi Zhang

    2011-09-01

    Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications.  However, important problems, especially variable selection, remain a challenge for classical modes of inference.  This paper develops a recently proposed framework of inferential models (IMs in the linear regression context.  In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense.  Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.

  9. Dynamic logistic regression model and population attributable fraction to investigate the association between adherence, missed visits and mortality: a study of HIV-infected adults surviving the first year of ART

    Science.gov (United States)

    2013-01-01

    Background Adherence is one of the most important determinants of viral suppression and drug resistance in HIV-infected people receiving antiretroviral therapy (ART). Methods We examined the association between long-term mortality and poor adherence to ART in DART trial participants in Uganda and Zimbabwe randomly assigned to receive laboratory and clinical monitoring (LCM), or clinically driven monitoring (CDM). Since over 50% of all deaths in the DART trial occurred during the first year on ART, we focussed on participants continuing ART for 12 months to investigate the implications of longer-term adherence to treatment on mortality. Participants’ ART adherence was assessed by pill counts and structured questionnaires at 4-weekly clinic visits. We studied the effect of recent adherence history on the risk of death at the individual level (odds ratios from dynamic logistic regression model), and on mortality at the population level (population attributable fraction based on this model). Analyses were conducted separately for both randomization groups, adjusted for relevant confounding factors. Adherence behaviour was also confounded by a partial factorial randomization comparing structured treatment interruptions (STI) with continuous ART (CT). Results In the CDM arm a significant association was found between poor adherence to ART in the previous 3-9 months with increased mortality risk. In the LCM arm the association was not significant. The odds ratios for mortality in participants with poor adherence against those with optimal adherence was 1.30 (95% CI 0.78,2.10) in the LCM arm and 2.18 (1.47,3.22) in the CDM arm. The estimated proportions of deaths that could have been avoided with optimal adherence (population attributable fraction) in the LCM and CDM groups during the 5 years follow-up period were 16.0% (95% CI 0.7%,31.6%) and 33.1% (20.5%,44.8%), correspondingly. Conclusions Recurrent poor adherence determined even through simple measures is associated

  10. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.

  11. Comparison and applicability of landslide susceptibility models based on landslide ratio-based logistic regression, frequency ratio, weight of evidence, and instability index methods in an extreme rainfall event

    Science.gov (United States)

    Wu, Chunhung

    2016-04-01

    Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall

  12. Multivariate Logistic Model to estimate Effective Rainfall for an Event

    Science.gov (United States)

    Singh, S. K.; Patil, Sachin; Bárdossy, A.

    2009-04-01

    Multivariate logistic models are widely used in biological, medical, and social sciences but logistic models are seldom applied to hydrological problems. A logistic function behaves linear in the mid range and tends to be non-linear as it approaches to the extremes, hence it is more flexible than a linear function and capable of dealing with skew-distributed variables. They seem to bear good potential to handle asymmetrically distributed hydrological variables of extreme occurrence. In this study, logistic regression approach is implemented to derive a multivariate logistic function for effective rainfall; in the process runoff coefficient is assumed to be a Bernoulli-distributed dependent variable. A backward stepwise logistic regression procedure was performed to derive the logistic transfer function between runoff coefficient and catchment as well as event variables (e.g. drainage density, soil moisture etc). The investigation was carried out using data base for 244 rainfall-runoff events from 42 mesoscale catchments located in south-west Germany. The performance of the derived logistic transfer function was compared with that of SCS method for estimation of effective rainfall.

  13. Bias of using odds ratio estimates in multinomial logistic regressions to estimate relative risk or prevalence ratio and alternatives

    Directory of Open Access Journals (Sweden)

    Suzi Alves Camey

    2014-01-01

    Full Text Available Recent studies have emphasized that there is no justification for using the odds ratio (OR as an approximation of the relative risk (RR or prevalence ratio (PR. Erroneous interpretations of the OR as RR or PR must be avoided, as several studies have shown that the OR is not a good approximation for these measures when the outcome is common (> 10%. For multinomial outcomes it is usual to use the multinomial logistic regression. In this context, there are no studies showing the impact of the approximation of the OR in the estimates of RR or PR. This study aimed to present and discuss alternative methods to multinomial logistic regression based upon robust Poisson regression and the log-binomial model. The approaches were compared by simulating various possible scenarios. The results showed that the proposed models have more precise and accurate estimates for the RR or PR than the multinomial logistic regression, as in the case of the binary outcome. Thus also for multinomial outcomes the OR must not be used as an approximation of the RR or PR, since this may lead to incorrect conclusions.

  14. Quantitative Models for Reverse Logistics

    NARCIS (Netherlands)

    M. Fleischmann (Moritz)

    2000-01-01

    textabstractEconomic, marketing, and legislative considerations are increasingly leading companies to take back and recover their products after use. From a logistics perspective, these initiatives give rise to new goods flows from the user back to the producer. The management of these goods flows o

  15. Quantitative Models for Reverse Logistics

    NARCIS (Netherlands)

    M. Fleischmann (Moritz)

    2000-01-01

    markdownabstractEconomic, marketing, and legislative considerations are increasingly leading companies to take back and recover their products after use. From a logistics perspective, these initiatives give rise to new goods flows from the user back to the producer. The management of these goods

  16. Use of binary logistic regression technique with MODIS data to estimate wild fire risk

    Science.gov (United States)

    Fan, Hong; Di, Liping; Yang, Wenli; Bonnlander, Brian; Li, Xiaoyan

    2007-11-01

    Many forest fires occur across the globe each year, which destroy life and property, and strongly impact ecosystems. In recent years, wildland fires and altered fire disturbance regimes have become a significant management and science problem affecting ecosystems and wildland/urban interface cross the United States and global. In this paper, we discuss the estimation of 504 probability models for forecasting fire risk for 14 fuel types, 12 months, one day/week/month in advance, which use 19 years of historical fire data in addition to meteorological and vegetation variables. MODIS land products are utilized as a major data source, and a logistical binary regression was adopted to solve fire forecast probability. In order to better modeling the change of fire risk along with the transition of seasons, some spatial and temporal stratification strategies were applied. In order to explore the possibilities of real time prediction, the Matlab distributing computing toolbox was used to accelerate the prediction. Finally, this study give an evaluation and validation of predict based on the ground truth collected. Validating results indicate these fire risk models have achieved nearly 70% accuracy of prediction and as well MODIS data are potential data source to implement near real-time fire risk prediction.

  17. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    Science.gov (United States)

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  18. Updated logistic regression equations for the calculation of post-fire debris-flow likelihood in the western United States

    Science.gov (United States)

    Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.

    2016-06-30

    Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.

  19. Lasso logistic regression, GSoft and the cyclic coordinate descent algorithm: application to gene expression data.

    Science.gov (United States)

    Garcia-Magariños, Manuel; Antoniadis, Anestis; Cao, Ricardo; Gonzãlez-Manteiga, Wenceslao

    2010-01-01

    Statistical methods generating sparse models are of great value in the gene expression field, where the number of covariates (genes) under study moves about the thousands while the sample sizes seldom reach a hundred of individuals. For phenotype classification, we propose different lasso logistic regression approaches with specific penalizations for each gene. These methods are based on a generalized soft-threshold (GSoft) estimator. We also show that a recent algorithm for convex optimization, namely, the cyclic coordinate descent (CCD) algorithm, provides with a way to solve the optimization problem significantly faster than with other competing methods. Viewing GSoft as an iterative thresholding procedure allows us to get the asymptotic properties of the resulting estimates in a straightforward manner. Results are obtained for simulated and real data. The leukemia and colon datasets are commonly used to evaluate new statistical approaches, so they come in useful to establish comparisons with similar methods. Furthermore, biological meaning is extracted from the leukemia results, and compared with previous studies. In summary, the approaches presented here give rise to sparse, interpretable models that are competitive with similar methods developed in the field.

  20. Ultrasonic Diagnosis of Breast Nodular Lesions by Logistic Regression%乳腺结节样病变超声诊断的 Logistic 回归分析

    Institute of Scientific and Technical Information of China (English)

    傅增顺

    2012-01-01

      目的建立乳腺结节样病变超声诊断的Logistic回归模型.方法对经手术病理证实的205个乳腺病变的二维超声、彩色多普勒超声声像特征进行回归分析,建立Logistic回归模型,用ROC曲线法评价Logistic回归模型的预报能力.结果9个超声特征进入Logistic模型初步筛选,即病灶后方回声改变、病灶活动度、病灶内血流信号、毛刺征、病灶内微小钙化、强回声晕征、包膜、腋窝淋巴节结构改变、纵横径比.经筛选后,具有显著性的病灶后方回声改变、病灶活动度、病灶内血流信号3因素再进一步Logistic回归分析,改善拟合优度. Logistic回归模型ROC曲线下面积为0.981.结论超声声像特征的Logistic 回归模型有助于乳腺良、恶性病变的鉴别诊断.%  Objective To establish a Logistic regression model based on ultrasonographic characteristics and to diagnose breast nodular lesions.Methods The characteristics of gray-scale ultrasonography ( US),color Doppler flow imaging ( CDFI) and some clinical symptoms were evaluated in 205 breast nodular lesions confirmed by surgical pathology on a retrospective study .A Logistic model for predic-ting malignancy of the breast nodular lesions on the basis of ultrasonographic characteristics and clinical symptoms were obtained .A receiver operating characteristic(ROC) curve was used to assess the performance of the Logistic model .Results Nine ultrasonographic characteristics entered the Logistic model.They were rear echo change,mass movement,color Doppler flow grade within lesion ,spicule sign,strong echo halo sign,micro-calcification,envelope,aspect ratio,and axillary lymph nodes structural change respectively .After screening,rear echo change, mass movement and color Doppler flow grade within lesion were done again to improve the goodness of fit .The area under the ROC curve was 0.981.Conclusion The Logistic regression model can help differentiate malignant

  1. Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources

    CERN Document Server

    Hall, Rob; Fienberg, Stephen

    2011-01-01

    Preserving the privacy of individual databases when carrying out statistical calculations has a long history in statistics and had been the focus of much recent attention in machine learning In this paper, we present a protocol for computing logistic regression when the data are held by separate parties without actually combining information sources by exploiting results from the literature on multi-party secure computation. We provide only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the combined database. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) and amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties.

  2. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    ZHU; Lixing

    2001-01-01

    [1]Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.[2]Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.[3]Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.[4]Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.[5]Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.[6]Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.[7]Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.[8]Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.[9]Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.[10]Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.[11]Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.[12]Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.[13]Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.[14]Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.[15]H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.[16]Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.[17

  3. Nowcasting sunshine number using logistic modeling

    Science.gov (United States)

    Brabec, Marek; Badescu, Viorel; Paulescu, Marius

    2013-04-01

    In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.

  4. Determination of sex using cephalo-facial dimensions by discriminant function and logistic regression equations

    Directory of Open Access Journals (Sweden)

    Twisha Shah

    2016-06-01

    Full Text Available The aim is to bring together the new anthropological techniques and knowledge about populations that are least known. The present study was performed on 901 healthy Gujarati volunteers (676 males, 225 females within the age group of 21–50 years with the aim to examine whether any correlation exists between cephalofacial measures naming maximum head length, maximum head breadth, bizygomatic breadth, bigonial diameter, morphological facial length, physiognomic facial length, biocular breadth and total cephalofacial height and sex determination. Also, discriminant function and logistic regression methods were verified to check the best accuracy level for sex determination. Mean values of cephalofacial dimensions were higher in males than in females. Best reliable results were obtained by using logistic regression equations in males (92% and discriminant function in females (80.9%. Our study conclusively establishes the existence of a definite statistically significant sexual dimorphism in Gujarati population using cephalo-facial dimensions.

  5. Comparison of Artificial Neural Networks and Logistic Regression Analysis in the Credit Risk Prediction

    Directory of Open Access Journals (Sweden)

    Hüseyin BUDAK

    2012-11-01

    Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.

  6. The use of logistic regression to enhance risk assessment and decision making by mental health administrators.

    Science.gov (United States)

    Menditto, Anthony A; Linhorst, Donald M; Coleman, James C; Beck, Niels C

    2006-04-01

    Development of policies and procedures to contend with the risks presented by elopement, aggression, and suicidal behaviors are long-standing challenges for mental health administrators. Guidance in making such judgments can be obtained through the use of a multivariate statistical technique known as logistic regression. This procedure can be used to develop a predictive equation that is mathematically formulated to use the best combination of predictors, rather than considering just one factor at a time. This paper presents an overview of logistic regression and its utility in mental health administrative decision making. A case example of its application is presented using data on elopements from Missouri's long-term state psychiatric hospitals. Ultimately, the use of statistical prediction analyses tempered with differential qualitative weighting of classification errors can augment decision-making processes in a manner that provides guidance and flexibility while wrestling with the complex problem of risk assessment and decision making.

  7. A logistic regression based approach for the prediction of flood warning threshold exceedance

    Science.gov (United States)

    Diomede, Tommaso; Trotter, Luca; Stefania Tesini, Maria; Marsigli, Chiara

    2016-04-01

    A method based on logistic regression is proposed for the prediction of river level threshold exceedance at short (+0-18h) and medium (+18-42h) lead times. The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation, (ii) the last 24 hours, which may be relevant for the current water level in the river, and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs. Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the state of the river can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18 hours (for the short lead time) and 36-42 hours (for the medium lead time). The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over the Santerno river catchment (about 450 km2) in the Emilia-Romagna Region, northern Italy. A statistical analysis in terms of false alarms, misses and related scores was carried out by using a 8-year long database. The results are quite satisfactory, with slightly better performances

  8. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression

    DEFF Research Database (Denmark)

    Scott, Neil W; Fayers, Peter M; Aaronson, Neil K

    2010-01-01

    Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....

  9. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression

    DEFF Research Database (Denmark)

    Scott, Neil W.; Fayers, Peter M.; Aaronson, Neil K.

    2010-01-01

    Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....

  10. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification.

    Science.gov (United States)

    Algamal, Zakariya Yahya; Lee, Muhammad Hisyam

    2015-12-01

    Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.

  11. Logistic Regression Analysis of Contrast-Enhanced Ultrasound and Conventional Ultrasound Characteristics of Sub-centimeter Thyroid Nodules.

    Science.gov (United States)

    Zhao, Rui-Na; Zhang, Bo; Yang, Xiao; Jiang, Yu-Xin; Lai, Xing-Jian; Zhang, Xiao-Yan

    2015-12-01

    The purpose of the study described here was to determine specific characteristics of thyroid microcarcinoma (TMC) and explore the value of contrast-enhanced ultrasound (CEUS) combined with conventional ultrasound (US) in the diagnosis of TMC. Characteristics of 63 patients with TMC and 39 with benign sub-centimeter thyroid nodules were retrospectively analyzed. Multivariate logistic regression analysis was performed to determine independent risk factors. Four variables were included in the logistic regression models: age, shape, blood flow distribution and enhancement pattern. The area under the receiver operating characteristic curve was 0.919. With 0.113 selected as the cutoff value, sensitivity, specificity, positive predictive value, negative predictive value and accuracy were 90.5%, 82.1%, 89.1%, 84.2% and 87.3%, respectively. Independent risk factors for TMC determined with the combination of CEUS and conventional US were age, shape, blood flow distribution and enhancement pattern. Age was negatively correlated with malignancy, whereas shape, blood flow distribution and enhancement pattern were positively correlated. The logistic regression model involving CEUS and conventional US was found to be effective in the diagnosis of sub-centimeter thyroid nodules.

  12. IDENTIFIKASI FAKTOR PREDIKSI DIAGNOSIS TINGKAT KEGANASAN KANKER PAYUDARA METODE STEPWISE BINARY LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Retno Aulia Vinarti

    2014-01-01

    Full Text Available The World Health Organization (WHO reported that deaths caused by cancer in the world these last four years has increased significantly. The data also reflected in the increase in breast cancer cases. In Indonesia, two cases also the highest cases of adult female deaths. Based on Hospital Information System, the number of breast cancer patients either inpatient or outpatient care amounted to 28.7%. This fact revealed more than 40% of all cancers can be prevented with early detection cancer. Role of Information Technology can implemented by data mining techniques to shorten the diagnosing time, accuracy and selection of factors early detection of breast cancer. Stepwise binary logistic regression method has the advantage to add and subtract the independent variables in accordance with level of significance of the model. Based on the analysis of weighting method, the highest four variables that should be more aware is the area of cancer (area, fineness (smoothness, the number of dots (concave points or the nucleus of cancer and grayish level of cancer (texture. So the accuracy and processing speed of diagnosis of the severity of breast cancer can be improved through this method.

  13. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  14. Determining the Impact of Residential Neighbourhood Crime on Housing Investment Using Logistic Regression

    Directory of Open Access Journals (Sweden)

    Sunday Emmanuel Olajide

    2016-12-01

    Full Text Available This paper discusses the impact of criminal activities on residential property value. With regard to criminal activities, the paper emphasizes on the contribution of each component of property crime. One thousand (1000 sets of structured questionnaire were administered on the residents of residential estates within the South Western States of Nigeria out of which 467 were considered useable after the data screening. Purposive and systematic sampling techniques were used while logistic regression was used to determine the impact of each of the components of residential property crime on housing investment. The results showed the P-Values of 0.000, 0.322, 0.335, 0.545 and 0.992 for violent crime, incivilities and street crime, burglary and theft, vandalism and robbery respectively. However, the R2 which represents the generalisation of the impact of neighbourhood crime on housing investment was 44 % and aggregate P-value was 0.000. Using the Hosmer and Lemeshow (H-L test of goodness of fit, the model had approximately 89% predictive probability which is considered excellent. This indicates that the alternative hypothesis is upheld that residential neighbourhood crime is capable of impacting on residential property value. The policy implication of this result is that no effort should be spared in combating residential neighbourhood crime in order to boost and encourage housing investment.

  15. Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination

    Science.gov (United States)

    Kamath, Venkatesh Gokuldas

    2015-01-01

    Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917

  16. Variable Selection for Functional Logistic Regression in fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Nedret BILLOR

    2015-03-01

    Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.

  17. Logistics and Transport - a conceptual model

    DEFF Research Database (Denmark)

    Jespersen, Per Homann; Drewes, Lise

    2004-01-01

    This paper describes how the freight transport sector is influenced by logistical principles of production and distribution. It introduces new ways of understanding freight transport as an integrated part of the changing trends of mobility. By introducing a conceptual model for understanding...... the interaction between logistics and transport, it points at ways to over-come inherent methodological difficulties when studying this relation...

  18. Oral health-related risk behaviours and attitudes among Croatian adolescents--multiple logistic regression analysis.

    Science.gov (United States)

    Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije

    2014-03-01

    The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.

  19. Shock index correlates with extravasation on angiographs of gastrointestinal hemorrhage: a logistics regression analysis.

    Science.gov (United States)

    Nakasone, Yutaka; Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori

    2007-01-01

    We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning.

  20. Logistic quantile regression provides improved estimates for bounded avian counts: A case study of California Spotted Owl fledgling production

    Science.gov (United States)

    Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of

  1. Selected Logistics Models and Techniques.

    Science.gov (United States)

    1984-09-01

    ACCESS PROCEDURE: On-Line System (OLS), UNINET . RCA maintains proprietary control of this model, and the model is available only through a lease...System (OLS), UNINET . RCA maintains proprietary control of this model, and the model is available only through a lease arrangement. • SPONSOR: ASD/ACCC

  2. Analysis of sparse data in logistic regression in medical research: A newer approach

    Directory of Open Access Journals (Sweden)

    S Devika

    2016-01-01

    Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell

  3. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    Science.gov (United States)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-11-01

    The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire

  4. Logistics Modeling for Lunar Exploration Systems

    Science.gov (United States)

    Andraschko, Mark R.; Merrill, R. Gabe; Earle, Kevin D.

    2008-01-01

    The extensive logistics required to support extended crewed operations in space make effective modeling of logistics requirements and deployment critical to predicting the behavior of human lunar exploration systems. This paper discusses the software that has been developed as part of the Campaign Manifest Analysis Tool in support of strategic analysis activities under the Constellation Architecture Team - Lunar. The described logistics module enables definition of logistics requirements across multiple surface locations and allows for the transfer of logistics between those locations. A key feature of the module is the loading algorithm that is used to efficiently load logistics by type into carriers and then onto landers. Attention is given to the capabilities and limitations of this loading algorithm, particularly with regard to surface transfers. These capabilities are described within the context of the object-oriented software implementation, with details provided on the applicability of using this approach to model other human exploration scenarios. Some challenges of incorporating probabilistics into this type of logistics analysis model are discussed at a high level.

  5. Cost Calculation Model for Logistics Service Providers

    Directory of Open Access Journals (Sweden)

    Zoltán Bokor

    2012-11-01

    Full Text Available The exact calculation of logistics costs has become a real challenge in logistics and supply chain management. It is essential to gain reliable and accurate costing information to attain efficient resource allocation within the logistics service provider companies. Traditional costing approaches, however, may not be sufficient to reach this aim in case of complex and heterogeneous logistics service structures. So this paper intends to explore the ways of improving the cost calculation regimes of logistics service providers and show how to adopt the multi-level full cost allocation technique in logistics practice. After determining the methodological framework, a sample cost calculation scheme is developed and tested by using estimated input data. Based on the theoretical findings and the experiences of the pilot project it can be concluded that the improved costing model contributes to making logistics costing more accurate and transparent. Moreover, the relations between costs and performances also become more visible, which enhances the effectiveness of logistics planning and controlling significantly

  6. Multinomial Logistic Regression Predicted Probability Map To Visualize The Influence Of Socio-Economic Factors On Breast Cancer Occurrence in Southern Karnataka

    Science.gov (United States)

    Madhu, B.; Ashok, N. C.; Balasubramanian, S.

    2014-11-01

    Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.

  7. Linear Logistic Test Modeling with R

    Science.gov (United States)

    Baghaei, Purya; Kubinger, Klaus D.

    2015-01-01

    The present paper gives a general introduction to the linear logistic test model (Fischer, 1973), an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014) functions to estimate the model and interpret its parameters. The…

  8. Large scale identification and categorization of protein sequences using structured logistic regression.

    Directory of Open Access Journals (Sweden)

    Bjørn P Pedersen

    Full Text Available BACKGROUND: Structured Logistic Regression (SLR is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem. RESULTS: Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases. CONCLUSIONS: Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.

  9. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    Directory of Open Access Journals (Sweden)

    Lawrence Rudner

    2016-07-01

    Full Text Available In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Na ve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows that the conclusion also applies to the probabilities estimated from short subtests of mental abilities and that small samples can yield excellent accuracy. The calculated Bayes probabilities can be used to provide meaningful examinee feedback regardless of whether the test was originally designed to be unidimensional.

  10. Calculation of a Health Index of Oil-Paper Transformers Insulation with Binary Logistic Regression

    OpenAIRE

    Weijie Zuo; Haiwen Yuan; Yuwei Shang; Yingyi Liu; Tao Chen

    2016-01-01

    This paper presents a new method for calculating the insulation health index (HI) of oil-paper transformers rated under 110 kV to provide a snapshot of health condition using binary logistic regression. Oil breakdown voltage (BDV), total acidity of oil, 2-Furfuraldehyde content, and dissolved gas analysis (DGA) are singled out in this method as the input data for determining HI. A sample of transformers is used to test the proposed method. The results are compared with the results calculated ...

  11. Estimating the causes of traffic accidents using logistic regression and discriminant analysis.

    Science.gov (United States)

    Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu

    2014-01-01

    Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.

  12. Calculation of a Health Index of Oil-Paper Transformers Insulation with Binary Logistic Regression

    Directory of Open Access Journals (Sweden)

    Weijie Zuo

    2016-01-01

    Full Text Available This paper presents a new method for calculating the insulation health index (HI of oil-paper transformers rated under 110 kV to provide a snapshot of health condition using binary logistic regression. Oil breakdown voltage (BDV, total acidity of oil, 2-Furfuraldehyde content, and dissolved gas analysis (DGA are singled out in this method as the input data for determining HI. A sample of transformers is used to test the proposed method. The results are compared with the results calculated for the same set of transformers using fuzzy logic. The comparison results show that the proposed method is reliable and effective in evaluating transformer health condition.

  13. Utilização de estratificação e modelo de regressão logística na análise de dados de estudos caso-controle Using of stratification and the logistic regression model in the analysis of data of case-control studies

    Directory of Open Access Journals (Sweden)

    Suely Godoy Agostinho Gimeno

    1995-08-01

    Full Text Available Exemplifica-se a aplicação de análise multivariada, por estratificação e com regressão logística, utilizando dados de um estudo caso-controle sobre câncer de esôfago. Oitenta e cinco casos e 292 controles foram classificados segundo sexo, idade e os hábitos de beber e de fumar. As estimativas por ponto dos odds ratios foram semelhantes, sendo as duas técnicas consideradas complementares.Data of a case-control study of esophageal cancer were used as an example of the use of multivariate analysis with stratification and logistic regression. Eighty-five cases and 292 controls were classified according to sex, age and smoking and drinking habits. The point estimates of the odds ratios were similar, and the techniques were considered complementary.

  14. A dynamic distribution model for combat logistics

    OpenAIRE

    Gue, Kevin R.

    1999-01-01

    New warfare doctrine for the U.S. Marine Corps emphasizes small, highly mobile forces supported from the sea, rather than from large, land based supply points. The goal of logistics planners is to support these forces with as little inventory on land as possible. We show how to configure the land based distribution system over time to support a given battle plan with minimum inventory. Logistics planners could use the model to support tactical or operational decision making.

  15. Controlling Type I Error Rates in Assessing DIF for Logistic Regression Method Combined with SIBTEST Regression Correction Procedure and DIF-Free-Then-DIF Strategy

    Science.gov (United States)

    Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung

    2014-01-01

    The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…

  16. Controlling Type I Error Rates in Assessing DIF for Logistic Regression Method Combined with SIBTEST Regression Correction Procedure and DIF-Free-Then-DIF Strategy

    Science.gov (United States)

    Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung

    2014-01-01

    The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…

  17. Applicability of the Ricketts' posteroanterior cephalometry for sex determination using logistic regression analysis in Hispano American Peruvians

    Directory of Open Access Journals (Sweden)

    Ivan Perez

    2016-01-01

    Full Text Available Background: The Ricketts' posteroanterior (PA cephalometry seems to be the most widely used and it has not been tested by multivariate statistics for sex determination.Objective: The objective was to determine the applicability of Ricketts' PA cephalometry for sex determination using the logistic regression analysis. Materials and Methods: The logistic models were estimated at distinct age cutoffs (all ages, 11 years, 13 years, and 15 years in a database from 1,296 Hispano American Peruvians between 5 years and 44 years of age. Results: The logistic models were composed by six cephalometric measurements; the accuracy achieved by resubstitution varied between 60% and 70% and all the variables, with one exception, exhibited a direct relationship with the probability of being classified as male; the nasal width exhibited an indirect relationship. Conclusion: The maxillary and facial widths were present in all models and may represent a sexual dimorphism indicator. The accuracy found was lower than the literature and the Ricketts' PA cephalometry may not be adequate for sex determination. The indirect relationship of the nasal width in models with data from patients of 12 years of age or less may be a trait related to age or a characteristic in the studied population, which could be better studied and confirmed.

  18. Forecasting and Analysis of Agricultural Product Logistics Demand in Tibet Based on Combination Forecasting Model

    Institute of Scientific and Technical Information of China (English)

    Wenfeng; YANG

    2015-01-01

    Over the years,the logistics development in Tibet has fallen behind the transport. Since the opening of Qinghai-Tibet Railway in2006,the opportunity for development of modern logistics has been brought to Tibet. The logistics demand analysis and forecasting is a prerequisite for regional logistics planning. By establishing indicator system for logistics demand of agricultural products,agricultural product logistics principal component regression model,gray forecasting model,BP neural network forecasting model are built. Because of the single model’s limitations,quadratic-linear programming model is used to build combination forecasting model to predict the logistics demand scale of agricultural products in Tibet over the next five years. The empirical analysis results show that combination forecasting model is superior to single forecasting model,and it has higher precision,so combination forecasting model will have much wider application foreground and development potential in the field of logistics.

  19. Parameter identification in the logistic STAR model

    DEFF Research Database (Denmark)

    Ekner, Line Elvstrøm; Nejstgaard, Emil

    We propose a new and simple parametrization of the so-called speed of transition parameter of the logistic smooth transition autoregressive (LSTAR) model. The new parametrization highlights that a consequence of the well-known identification problem of the speed of transition parameter is that th......We propose a new and simple parametrization of the so-called speed of transition parameter of the logistic smooth transition autoregressive (LSTAR) model. The new parametrization highlights that a consequence of the well-known identification problem of the speed of transition parameter...

  20. Education-Based Gaps in eHealth: A Weighted Logistic Regression Approach.

    Science.gov (United States)

    Amo, Laura

    2016-10-12

    Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of eHealth behavior. After controlling for college

  1. A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

    Directory of Open Access Journals (Sweden)

    Jingyuan Zhao

    2012-01-01

    Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.

  2. MULTIVARIATE STEPWISE LOGISTIC REGRESSION ANALYSIS ON RISK FACTORS OF VENTILATOR-ASSOCIATED PNEUMONIA IN COMPREHENSIVE ICU

    Institute of Scientific and Technical Information of China (English)

    管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放

    2003-01-01

    Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of

  3. Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran

    Institute of Scientific and Technical Information of China (English)

    Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI

    2012-01-01

    We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.

  4. Peripheral vascular trauma in children: related factors by the logistic regression method

    Directory of Open Access Journals (Sweden)

    Raquel Nogueira Avelar Silva

    2014-03-01

    Full Text Available The objective of the present study was to identify the factors related to “peripheral vascular trauma” in children aged six months to 12 years. This prospective cohort study included children with peripheral vein punctured for the first time per side and excluded those with high/complete healing of trauma signs after removing the catheter. Daily clinical evaluations were performed in intervals shorter than 24 hours. Data were treated according to Pearson’s test and the logistic regression method. Among the 14 variables considered intervenient, four were statistically associated to the occurrence of trauma: dirtiness and humidity in the catheter insertion site, catheter caliber, and age. A causal relationship was found between the intervenient variables and the outcome, “peripheral vascular trauma”, thus, contributing to forming the knowledge of the peripheral venous puncture in children aged six months to 12 years. Descriptors: Child; Nursing Diagnosis; Veins; Injuries.

  5. A simple and efficient algorithm for gene selection using sparse logistic regression.

    Science.gov (United States)

    Shevade, S K; Keerthi, S S

    2003-11-22

    This paper gives a new and efficient algorithm for the sparse logistic regression problem. The proposed algorithm is based on the Gauss-Seidel method and is asymptotically convergent. It is simple and extremely easy to implement; it neither uses any sophisticated mathematical programming software nor needs any matrix operations. It can be applied to a variety of real-world problems like identifying marker genes and building a classifier in the context of cancer diagnosis using microarray data. The gene selection method suggested in this paper is demonstrated on two real-world data sets and the results were found to be consistent with the literature. The implementation of this algorithm is available at the site http://guppy.mpe.nus.edu.sg/~mpessk/SparseLOGREG.shtml Supplementary material is available at the site http://guppy.mpe.nus.edu.sg/~mpessk/SparseLOGREG.shtml

  6. Logistic regression analysis of the risk factors of acute renal failure complicating limb war injuries

    Directory of Open Access Journals (Sweden)

    Chang-zhi CHENG

    2011-06-01

    Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.

  7. CREDIT SCORING MODELING WITH STATE-DEPENDENT SAMPLE SELECTION: A COMPARISON STUDY WITH THE USUAL LOGISTIC MODELING

    Directory of Open Access Journals (Sweden)

    Paulo H. Ferreira

    2015-04-01

    Full Text Available Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model, a logistic regression with state-dependent sample selection model and a bounded logistic regression model via a large simulation study. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian retail bank portfolio. Our simulation results so far revealed that there is nostatistically significant difference in terms of predictive capacity among the naive logistic regression models, the logistic regression with state-dependent sample selection models and the bounded logistic regression models. However, there is difference between the distributions of the estimated default probabilities from these three statistical modeling techniques, with the naive logistic regression models and the boundedlogistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. Which are common in practice.

  8. Applying waste logistics modeling to regional planning

    Energy Technology Data Exchange (ETDEWEB)

    Holter, G.M.; Khawaja, A.; Shaver, S.R.; Peterson, K.L.

    1995-05-01

    Waste logistics modeling is a powerful analytical technique that can be used for effective planning of future solid waste storage, treatment, and disposal activities. Proper waste management is essential for preventing unacceptable environmental degradation from ongoing operations, and is also a critical part of any environmental remediation activity. Logistics modeling allows for analysis of alternate scenarios for future waste flowrates and routings, facility schedules, and processing or handling capacities. Such analyses provide an increased understanding of the critical needs for waste storage, treatment, transport, and disposal while there is still adequate lead time to plan accordingly. They also provide a basis for determining the sensitivity of these critical needs to the various system parameters. This paper discusses the application of waste logistics modeling concepts to regional planning. In addition to ongoing efforts to aid in planning for a large industrial complex, the Pacific Northwest Laboratory (PNL) is currently involved in implementing waste logistics modeling as part of the planning process for material recovery and recycling within a multi-city region in the western US.

  9. Logistics Chains in Freight Transport Modelling

    NARCIS (Netherlands)

    Davydenko, I.Y.

    2015-01-01

    The flow of trade is not equal to transport flows, mainly due to the fact that warehouses and distribution facilities are used as intermediary stops on the way from production locations to the points of consumption or further rework of goods. This thesis proposes a logistics chain model, which estim

  10. Logistics Chains in Freight Transport Modelling

    NARCIS (Netherlands)

    Davydenko, I.Y.

    2015-01-01

    The flow of trade is not equal to transport flows, mainly due to the fact that warehouses and distribution facilities are used as intermediary stops on the way from production locations to the points of consumption or further rework of goods. This thesis proposes a logistics chain model, which

  11. Planning model of purchasing logistics in outsourcing

    Directory of Open Access Journals (Sweden)

    Igor JAKOMIN

    2014-03-01

    Full Text Available It is often the case that when preparing their offers, potential outsourcers of logistic activities do not thoroughly research all the activities that have an influence on the process of logistics. Consequently, they prepare relatively expensive offers (that can later lead to greater unexpected costs which, in many cases, business partners decide against and persist with their own existing methods of doing business. The original contribution to science in this article is a model that will aid better understanding of dealing with problems and will, in practice, serve as a tool for the successful execution of business offers by outsourcers. Following research we have discovered, and are able to confirm, that despite the high start-up costs of the outsourcing, in the long term the company can reduce logistic costs. The model presented serves as an in-depth analysis of the company which enables the definition of favourable and optimal offers for outsourcing. The model shown helps to minimise the influence of mistrust and emphasises the importance of reducing the logistic costs with outsourcing.

  12. Linear Logistic Test Modeling with R

    Directory of Open Access Journals (Sweden)

    Purya Baghaei

    2014-01-01

    Full Text Available The present paper gives a general introduction to the linear logistic test model (Fischer, 1973, an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014 functions to estimate the model and interpret its parameters. The applications of the model in test validation, hypothesis testing, cross-cultural studies of test bias, rule-based item generation, and investigating construct irrelevant factors which contribute to item difficulty are explained. The model is applied to an English as a foreign language reading comprehension test and the results are discussed.

  13. Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.

    Science.gov (United States)

    Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H

    2006-01-01

    Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.

  14. Least Square Support Vector Machine Classifier vs a Logistic Regression Classifier on the Recognition of Numeric Digits

    Directory of Open Access Journals (Sweden)

    Danilo A. López-Sarmiento

    2013-11-01

    Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.

  15. Assessing change with the extended logistic model.

    Science.gov (United States)

    Cristante, Francesca; Robusto, Egidio

    2007-11-01

    The purpose of this article is to define a method for the assessment of change. A reinterpretation of the extended logistic model is proposed. The extended logistic model for the assessment of change (ELMAC) allows the definition of a time parameter which is supposed to identify whether change occurs during a period of time, given a specific event or phenomenon. The assessment of a trend of change through time, on the basis of the time parameter which is estimated at different successive occasions during a period of time, is also considered. In addition, a dispersion parameter is calculated which identifies whether change is consistent at each time point. The issue of independence is taken into account both in relation to the time parameter and the dispersion parameter. An application of the ELMAC in a learning process is presented. The interpretation of the model parameters and the model fit statistics is consistent with expectations.

  16. A Logistic Regression Analysis of Turkey's 15-Year-Olds' Scoring above the OECD Average on the PISA'09 Reading Assessment

    Science.gov (United States)

    Kasapoglu, Koray

    2014-01-01

    This study aims to investigate which factors are associated with Turkey's 15-year-olds' scoring above the OECD average (493) on the PISA'09 reading assessment. Collected from a total of 4,996 15-year-old students from Turkey, data were analyzed by logistic regression analysis in order to model the data of students who were split into two: (1)…

  17. Improved Testing and Specifivations of Smooth Transition Regression Models

    OpenAIRE

    Escribano, Álvaro; Jordá, Óscar

    1997-01-01

    This paper extends previous work in Escribano and Jordá (1997)and introduces new LM specification procedures to choose between Logistic and Exponential Smooth Transition Regression (STR)Models. These procedures are simpler, consistent and more powerful than those previously available in the literature. An analysis of the properties of Taylor approximations around the transition function of STR models permits one to understand why these procedures work better and it suggests ways to improve te...

  18. Potential misinterpretation of treatment effects due to use of odds ratios and logistic regression in randomized controlled trials.

    Directory of Open Access Journals (Sweden)

    Mirjam J Knol

    Full Text Available BACKGROUND: In randomized controlled trials (RCTs, the odds ratio (OR can substantially overestimate the risk ratio (RR if the incidence of the outcome is over 10%. This study determined the frequency of use of ORs, the frequency of overestimation of the OR as compared with its accompanying RR in published RCTs, and we assessed how often regression models that calculate RRs were used. METHODS: We included 288 RCTs published in 2008 in five major general medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, New England Journal of Medicine. If an OR was reported, we calculated the corresponding RR, and we calculated the percentage of overestimation by using the formula . RESULTS: Of 193 RCTs with a dichotomous primary outcome, 24 (12.4% presented a crude and/or adjusted OR for the primary outcome. In five RCTs (2.6%, the OR differed more than 100% from its accompanying RR on the log scale. Forty-one of all included RCTs (n = 288; 14.2% presented ORs for other outcomes, or for subgroup analyses. Nineteen of these RCTs (6.6% had at least one OR that deviated more than 100% from its accompanying RR on the log scale. Of 53 RCTs that adjusted for baseline variables, 15 used logistic regression. Alternative methods to estimate RRs were only used in four RCTs. CONCLUSION: ORs and logistic regression are often used in RCTs and in many articles the OR did not approximate the RR. Although the authors did not explicitly misinterpret these ORs as RRs, misinterpretation by readers can seriously affect treatment decisions and policy making.

  19. Short-Run Asset Selection using a Logistic Model

    Directory of Open Access Journals (Sweden)

    Walter Gonçalves Junior

    2011-06-01

    Full Text Available Investors constantly look for significant predictors and accurate models to forecast future results, whose occasional efficacy end up being neutralized by market efficiency. Regardless, such predictors are widely used for seeking better (and more unique perceptions. This paper aims to investigate to what extent some of the most notorious indicators have discriminatory power to select stocks, and if it is feasible with such variables to build models that could anticipate those with good performance. In order to do that, logistical regressions were conducted with stocks traded at Bovespa using the selected indicators as explanatory variables. Investigated in this study were the outputs of Bovespa Index, liquidity, the Sharpe Ratio, ROE, MB, size and age evidenced to be significant predictors. Also examined were half-year, logistical models, which were adjusted in order to check the potential acceptable discriminatory power for the asset selection.

  20. Delivery Time Reliability Model of Logistics Network

    OpenAIRE

    Liusan Wu; Qingmei Tan; Yuehui Zhang

    2013-01-01

    Natural disasters like earthquake and flood will surely destroy the existing traffic network, usually accompanied by delivery delay or even network collapse. A logistics-network-related delivery time reliability model defined by a shortest-time entropy is proposed as a means to estimate the actual delivery time reliability. The less the entropy is, the stronger the delivery time reliability remains, and vice versa. The shortest delivery time is computed separately based on two different assum...

  1. A robust optimization model for green regional logistics network design with uncertainty in future logistics demand

    Directory of Open Access Journals (Sweden)

    Dezhi Zhang

    2015-12-01

    Full Text Available This article proposes a new model to address the design problem of a sustainable regional logistics network with uncertainty in future logistics demand. In the proposed model, the future logistics demand is assumed to be a random variable with a given probability distribution. A set of chance constraints with regard to logistics service capacity and environmental impacts is incorporated to consider the sustainability of logistics network design. The proposed model is formulated as a two-stage robust optimization problem. The first-stage problem before the realization of future logistics demand aims to minimize a risk-averse objective by determining the optimal location and size of logistics parks with CO2 emission taxes consideration. The second stage after the uncertain logistics demand has been determined is a scenario-based stochastic logistics service route choices equilibrium problem. A heuristic solution algorithm, which is a combination of penalty function method, genetic algorithm, and Gauss–Seidel decomposition approach, is developed to solve the proposed model. An illustrative example is given to show the application of the proposed model and solution algorithm. The findings show that total social welfare of the logistics system depends very much on the level of uncertainty in future logistics demand, capital budget for logistics parks, and confidence levels of the chance constraints.

  2. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands

    Directory of Open Access Journals (Sweden)

    Lindsay M. Veazey

    2016-07-01

    Full Text Available Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3% for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence” threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i.

  3. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands.

    Science.gov (United States)

    Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J

    2016-01-01

    Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.

  4. Semiparametric Regression and Model Refining

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.

  5. Identification of the security threshold by logistic regression applied to fuel under accident conditions

    Energy Technology Data Exchange (ETDEWEB)

    Gomes, Daniel de Souza; Baptista Filho, Benedito; Oliveira, Fabio Branco de, E-mail: dsgomes@ipen.br, E-mail: bdbfilho@ipen.br, E-mail: fabio@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Giovedi, Claudia, E-mail: claudia.giovedi@labrisco.usp.br [Universidade de Sao Paulo (POLI/USP), Sao Paulo, SP (Brazil). Lab. de Analise, Avaliacao e Gerenciamento de Risco

    2015-07-01

    A reactivity-initiated Accident (RIA) is a disastrous failure, which occurs because of an unexpected rise in the fission rate and reactor power. This sudden increase in the reactor power may activate processes that might lead to the failure of fuel cladding. In severe accidents, a disruption of fuel and core melting can occur. The purpose of the present research is to study the patterns of such accidents using exploratory data analysis techniques. A study based on applied statistics was used for simulations. Then, we chose peak enthalpy, pulse width, burnup, fission gas release, and the oxidation of zirconium as input parameters and set the safety boundary conditions. This new approach includes the logistic regression. With this, the present research aims also to develop the ability to identify the conditions and the probability of failures. Zirconium-based alloys fabricating the cladding of the fuel rod elements with niobium 1% were analyzed for high burnup limits at 65 MWd/kgU. The data based on six decades of investigations from experimental programs. In test, perform in American reactors such as the transient reactor test (TREAT), and power Burst Facility (PBF). In experiments realized in Japanese program at nuclear in the safety research reactor (NSRR), and in Kazakhstan as impulse graphite reactor (IGR). The database obtained from the tests and served as a support for our study. (author)

  6. Reducing a spatial database to its effective dimensionality for logistic-regression analysis of incidence of livestock disease.

    Science.gov (United States)

    Duchateau, L; Kruska, R L; Perry, B D

    1997-10-01

    Large databases with multiple variables, selected because they are available and might provide an insight into establishing causal relationships, are often difficult to analyse and interpret because of multicollinearity. The objective of this study was to reduce the dimensionality of a multivariable spatial database of Zimbabwe, containing many environmental variables that were collected to predict the distribution of outbreaks of theileriosis (the tick-borne infection of cattle caused by Theileria parva and transmitted by the brown ear tick). Principal-component analysis and varimax rotation of the principal components were first used to select a reduced number of variables. The logistic-regression model was evaluated by appropriate goodness-of-fit tests.

  7. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  8. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.

  9. Consumer Choice Prediction: Artificial Neural Networks versus Logistic Models

    Directory of Open Access Journals (Sweden)

    Christopher Gan

    2005-01-01

    Full Text Available Conventional econometric models, such as discriminant analysis and logistic regression have been used to predict consumer choice. However, in recent years, there has been a growing interest in applying artificial neural networks (ANN to analyse consumer behaviour and to model the consumer decision-making process. The purpose of this paper is to empirically compare the predictive power of the probability neural network (PNN, a special class of neural networks and a MLFN with a logistic model on consumers’ choices between electronic banking and non-electronic banking. Data for this analysis was obtained through a mail survey sent to 1,960 New Zealand households. The questionnaire gathered information on the factors consumers’ use to decide between electronic banking versus non-electronic banking. The factors include service quality dimensions, perceived risk factors, user input factors, price factors, service product characteristics and individual factors. In addition, demographic variables including age, gender, marital status, ethnic background, educational qualification, employment, income and area of residence are considered in the analysis. Empirical results showed that both ANN models (MLFN and PNN exhibit a higher overall percentage correct on consumer choice predictions than the logistic model. Furthermore, the PNN demonstrates to be the best predictive model since it has the highest overall percentage correct and a very low percentage error on both Type I and Type II errors.

  10. A Theoretic Model of Transport Logistics Demand

    OpenAIRE

    Natalija Jolić; Nikolina Brnjac; Ivica Oreb

    2006-01-01

    Concerning transport logistics as relation between transportand integrated approaches to logistics, some transport and logisticsspecialists consider the tenn tautological. However,transport is one of the components of logistics, along with inventories,resources, warehousing, infonnation and goods handling.Transport logistics considers wider commercial and operationalframeworks within which the flow of goods is plannedand managed. The demand for transport logistics services canbe valorised as ...

  11. Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets.

    Science.gov (United States)

    Heinze, Georg; Puhr, Rainer

    2010-03-30

    Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at: http://www.muw.ac.at/msi/biometrie/programs.

  12. BLOOD PRESSURE AWARENESS AMONG GENERAL POPULATION: A RURAL WEST BENGAL EXPERIENCE WITH LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Sanjoy Kumar Sadhukhan

    2012-02-01

    Full Text Available Objectives: The study was conducted with an objective to find out the awareness of self blood pressure in a rural community of West Bengal and factors associated with it. Methods: A community based cross-sectional study on self BP awareness among adults (≥18 years was carried out in a rural community of West Bengal through house to house visits. Total study subjects were1201 (Male=598; Female=603 of which 132 (11% were hypertensive. Results: Only 17.2% of all study subjects were aware of their own BP readings with no male-female difference. This awareness was significantly associated with age, education, economic status and hypertension, which remained significant, even after multiple logistic regressions. Even among hypertensives, only 38% were aware of their self BP. Nearly 67.11% of the study subjects had no knowledge about complications of hypertension. About 86.92% of the study subjects were ignorant about the life style changes required to prevent hypertension. Regarding hypertension control/treatment, 72.85% of study subjects were unaware. In general, males had better knowledge compared to females,although not always statistically significant. Conclusion: Self BP awareness among this study population was very poor even among the hypertensives leading to a high risk of cerebrovascular accidents and coronary heart diseases. Interpersonal communication in medical facilities as well as other strategies like group-discussions (general and focal, mass media and general education system can be utilized to improve the situation. [National J of Med Res 2012; 2(1.000: 55-58

  13. The Research about the Comparison of RBF Neural Network with Logistic Regression%RBF神经网络与logistic回归模型的对比研究

    Institute of Scientific and Technical Information of China (English)

    姚应水; 叶明全

    2011-01-01

    目的 RBF神经网络是一种重要的数据挖掘分类模型,探讨RBF神经网络在解决判别分析问题中的应用.方法 通过实例比较RBF神经网络和logistic回归模型的性能优劣.结果 RBF神经网络的回代拟合效果和泛化能力明显优于logistic回归模型.结论RBF神经网络在医学统计学领域中具有较好的应用前景.%Objective RBF neural network is an important data mining classification model in data mining. To explore the application of RBF neural network on medical discriminant analysis through comparing with logistic regression. Methods Comparing the prediction results by some statistical indexes of the RBF neural network and the logistic regression by using an example. Results The comparison results of the prediction performance between RBF neural network and logistic regression show that RBF neural network is much better than logistic regression for the data. Conclusion RBF neural network will make a better facture of its appfi-cadon in medical researches.

  14. Regression Model With Elliptically Contoured Errors

    CERN Document Server

    Arashi, M; Tabatabaey, S M M

    2012-01-01

    For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.

  15. The Infinite Hierarchical Factor Regression Model

    CERN Document Server

    Rai, Piyush

    2009-01-01

    We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

  16. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    Science.gov (United States)

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  17. Establishing the change in antibiotic resistance of Enterococcus faecium strains isolated from Dutch broilers by logistic regression and survival analysis

    NARCIS (Netherlands)

    Stegeman, J.A.; Vernooij, J.C.M.; Khalifa, O.A.; Broek, van den J.; Mevius, D.J.

    2006-01-01

    In this study, we investigated the change in the resistance of Enterococcus faecium strains isolated from Dutch broilers against erythromycin and virginiamycin in 1998, 1999 and 2001 by logistic regression analysis and survival analysis. The E. faecium strains were isolated from caecal samples that

  18. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    Science.gov (United States)

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  19. Occurrence probability assessment of earthquake-triggered landslides with Newmark displacement values and logistic regression: The Wenchuan earthquake, China

    Science.gov (United States)

    Wang, Ying; Song, Chongzhen; Lin, Qigen; Li, Juan

    2016-04-01

    The Newmark displacement model has been used to predict earthquake-triggered landslides. Logistic regression (LR) is also a common landslide hazard assessment method. We combined the Newmark displacement model and LR and applied them to Wenchuan County and Beichuan County in China, which were affected by the Ms. 8.0 Wenchuan earthquake on May 12th, 2008, to develop a mechanism-based landslide occurrence probability model and improve the predictive accuracy. A total of 1904 landslide sites in Wenchuan County and 3800 random non-landslide sites were selected as the training dataset. We applied the Newmark model and obtained the distribution of permanent displacement (Dn) for a 30 × 30 m grid. Four factors (Dn, topographic relief, and distances to drainages and roads) were used as independent variables for LR. Then, a combined model was obtained, with an AUC (area under the curve) value of 0.797 for Wenchuan County. A total of 617 landslide sites and non-landslide sites in Beichuan County were used as a validation dataset with AUC = 0.753. The proposed method may also be applied to earthquake-induced landslides in other regions.

  20. Delivery Time Reliability Model of Logistics Network

    Directory of Open Access Journals (Sweden)

    Liusan Wu

    2013-01-01

    Full Text Available Natural disasters like earthquake and flood will surely destroy the existing traffic network, usually accompanied by delivery delay or even network collapse. A logistics-network-related delivery time reliability model defined by a shortest-time entropy is proposed as a means to estimate the actual delivery time reliability. The less the entropy is, the stronger the delivery time reliability remains, and vice versa. The shortest delivery time is computed separately based on two different assumptions. If a path is concerned without capacity restriction, the shortest delivery time is positively related to the length of the shortest path, and if a path is concerned with capacity restriction, a minimax programming model is built to figure up the shortest delivery time. Finally, an example is utilized to confirm the validity and practicality of the proposed approach.

  1. Role of artificial neural network and logistic regression model in predicting effect of extracorporeal shock wave for upper urinary tract calculi%人工神经网络及Logistic回归模型对预测体外冲击波治疗上尿路结石的疗效分析

    Institute of Scientific and Technical Information of China (English)

    蒋杰宏; 姚聪; 陈健芬; 徐乐

    2016-01-01

    目的 探讨人工神经网络和Logistic回归模型对预测体外冲击波治疗上尿路结石的治疗效果预测.方法 从2010年1月至2015年1月,本院泌尿科共接受ESWL治疗的肾结石患者340例,将治疗前的病例资料10项(年龄大小、体重指数大小、病程时间长短、性别、尿路刺激症、血尿、肾绞痛、结石位置、患侧和大小)纳入预测参数,建立人工神经网络和Logistic回归模型,预测体外冲击波治疗上尿路结石的临床疗效.结果 人工神经网络得到预测参数重要性的前5位依次为结石大小、病程时间、血尿、结石位置、体重指数,进行显著性检验时,P< 0.05.Logistic回归模型中重要的参数分别为病程时间、血尿和结石位置,差异有统计学意义,P< 0.05.结论 人工神经网络和Logistic回归模型预测ESWL治疗上尿路结石成功率有较好的准确性,可以在临床上广泛推广.%Objective To explore the role of artificial neural network and logistic regression model in predicting the effect of extracorporeal shock wave for upper urinary tract calculi.Methods From January,2010 to January,2015,d 340 patients with renal calculus were treated by ESWL at our hospital.The predictive parameters were sex,symptoms induced by urethral irritation,blood urine,renal colic,stone position,stone of one side,age,BMI,disease course,and stone size.Artificial neural network and logistic regression model were built basing on these parameters to predict the clinical effect of ESWL for calculus of upper urinary tract.Results The most important five parameters in artificial neural network were stone size,disease course,blood urine,stone position,and BMI,with statistical differences (P<0.05).The most important parameters in logistic regression model were disease course,blood urine,and stone position,with statistical differences (P<0.05).Conclusions Artificial neural network and logistic regression model in predicting the effect of

  2. An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy: The Case of Neighbourhoods and Health.

    Directory of Open Access Journals (Sweden)

    Juan Merlo

    Full Text Available Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR. In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between "specific" (measures of association and "general" (measures of variance contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health.We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC curve for individual-level covariates (i.e., age, sex and individual low income. In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income is interpreted jointly with the proportional change in variance (i.e., PCV and the proportion of ORs in the opposite direction (POOR statistics.For both outcomes, information on individual characteristics (Step 1 provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP. Accounting for neighbourhood of residence (Step 2 only improved the AUC for choosing a private GP (+0.295 units. High neighbourhood income (Step 3 was strongly associated to choosing a private GP (OR = 3.50 but the PCV was only 11% and the POOR 33%.Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible influence on individual use of psychotropic drugs, but

  3. Hospital-level associations with 30-day patient mortality after cardiac surgery: a tutorial on the application and interpretation of marginal and multilevel logistic regression

    Directory of Open Access Journals (Sweden)

    Sanagou Masoumeh

    2012-03-01

    Full Text Available Abstract Background Marginal and multilevel logistic regression methods can estimate associations between hospital-level factors and patient-level 30-day mortality outcomes after cardiac surgery. However, it is not widely understood how the interpretation of hospital-level effects differs between these methods. Methods The Australasian Society of Cardiac and Thoracic Surgeons (ASCTS registry provided data on 32,354 patients undergoing cardiac surgery in 18 hospitals from 2001 to 2009. The logistic regression methods related 30-day mortality after surgery to hospital characteristics with concurrent adjustment for patient characteristics. Results Hospital-level mortality rates varied from 1.0% to 4.1% of patients. Ordinary, marginal and multilevel regression methods differed with regard to point estimates and conclusions on statistical significance for hospital-level risk factors; ordinary logistic regression giving inappropriately narrow confidence intervals. The median odds ratio, MOR, from the multilevel model was 1.2 whereas ORs for most patient-level characteristics were of greater magnitude suggesting that unexplained between-hospital variation was not as relevant as patient-level characteristics for understanding mortality rates. For hospital-level characteristics in the multilevel model, 80% interval ORs, IOR-80%, supplemented the usual ORs from the logistic regression. The IOR-80% was (0.8 to 1.8 for academic affiliation and (0.6 to 1.3 for the median annual number of cardiac surgery procedures. The width of these intervals reflected the unexplained variation between hospitals in mortality rates; the inclusion of one in each interval suggested an inability to add meaningfully to explaining variation in mortality rates. Conclusions Marginal and multilevel models take different approaches to account for correlation between patients within hospitals and they lead to different interpretations for hospital-level odds ratios.

  4. LOGISTIC REGRESSION ANALYSIS ON RELATIONSHIP OF SERUM HOMOCYSTEINE, FOLIC ACID, AND VITAMIN B12 WITH CORONARY ARTERIOPATHY

    Institute of Scientific and Technical Information of China (English)

    王真; 郭静宣; 毛节明; 王天成; 赵一呜

    2001-01-01

    Objective To investigate the relationship among the serum homocysteine (Hcy), folic acid and vitamin B12 with coronary arteriopathy.Methods In a cross-sectional study, serum Hcy levels of 210 cases with (CHD), 115 non CHD subjects from a consecutive series of subjects with chest pain or myocardial infarction(MI) undergoing diagnostic coronary angiography and 63 subjects undergoing health examination were measured using high-performance liquid chromatography (HPLC) with fluorescence detection. Serum folic acid and vitamin B12 level were measured by radioimmunoassay method. Serum cholesterol and lipoproteins were also measured. The information on conventional risk factors were collected by interviews.Results The coronary arteriopathy was correspondingly related with male, smoking, diabetes, folic acid, vitamin B12, ApoA1, and Hcy level. The mean serum Hcy level were significantly higher in CHD patients than in non CHD patients(19.01±10.36 μmol/L n=210 vs 11.5+4.97 μmol/L n=115, P<0.01). The mean serum folic acid level and vitamin B12 level were significantly lower in CHD patients (4.5±1.5 pg/ml vs 414.6±142.3 pg/ml) than in non CHD patients (5.6±1.4ng/ml vs 537.7±136.6 ng/ml), P<0.01. There is no difference on the mean serum Hcy level in NCHD cases and the healthy subjects. The mean serum ApoA1 (1188.8±206.1 mmol/L vs 1262.1±201.4 mmol/L)level was significantly lower in CHD patients than in non CHD patients, P<0.05. CHD patients had higher rates of smoking, aging and suffering from diabetes than non CHD patients. By multivariate logistic regression, the OR of Hcy, aging, male and diabetes were all≥1, P<0.01, which means all these factors are independent risk factors. With forward method, when folic acid, vitamin B12 and Hcy entering the regression model, the coefficients of Hcy changed greatly, showed multivariate co-liner on logistic regression.Conclusion The results of our study showed that Hcy, male, senility and diabetes were all independent risk

  5. 加权Logistic回归模型在斑岩铜矿预测中的应用——以中—哈边境扎尔—玛萨吾尔成矿带为例%The Application of Weighted Logistic Regression Model in Prediction of Porphyry Copper Deposit——take Zharma-Sawur metallogenic belt, China-Kazakhstan border area, as an example

    Institute of Scientific and Technical Information of China (English)

    努丽曼古·阿不都克力木; 张晓帆; 陈川; 徐仕琪; 赵同阳

    2012-01-01

    加权Logistic回归是基于GIS成矿预测的主要方法之一,其模型是不同于线性模型的一种类型.它具有强大的空间分析功能、适用性强、不受任何独立条件的约束、预测结果更可靠,因此在矿产资源评价研究中得到了很多地质学家的青睐.以矿床模型和成矿理论为基础,加权Logistic回归分析模型在成矿预测中的应用主要包括三部分:加权Logistic回归模型的建立及其应用、成矿有利度综合评价、成矿远景区圈定.本文以中国—哈萨克斯坦边境地区扎尔玛—萨吾尔成矿带斑岩型铜矿为例,探讨了基于GIS的加权Logistic回归模型在成矿预测中的应用.%Weighted Logistic Regression is one of the main methods of mineral potential mapping. It is different from linear model. Because of its powerful spatial analysis function, strong adaptability, unconstrained by independent conditions, and more reliable prediction results, Weighted Logistic Regression is widely used by many geologists in mineral resources assessment. Based on the mineral deposit model and theory, Weighted Logistic Regression is consists of three parts: (1) Establishment of weighted logistic regression model for mineral potential mapping; (2 ) comprehensive evaluation of favorable degrees; (3 ) mineral potential mapping of study area. By the Weighted Logistic Regression model for mineral potential mapping, Zharma-Sawur Metallogic Belt which across border region of China and Kazakhstan is studied and mineral prospecting area of porphyry copper deposit is mapped. At the end, the availability of Weighted Logistic Regression Model for mineral potential mapping is discussed.

  6. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  7. A new bivariate negative binomial regression model

    Science.gov (United States)

    Faroughi, Pouya; Ismail, Noriszura

    2014-12-01

    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  8. 基于LSTAR模型的我国省际农民人均收入分析%Analysis on the Provincial Per Capita Net Income of Rural Households in China Based on Logistic Smooth Transition Regression Model

    Institute of Scientific and Technical Information of China (English)

    朱鸿婷

    2012-01-01

    为了分析我国各省农民人均纯收入的发展差异,利用LSTAR模型分析了全国31个省1978年以来农民人均纯收入变化情况,研究各省份农民人均纯收入的拐点及转换速度.结果表明:将农民人均纯收入转换速度分为5组,转变最快与最慢的一组与经济发展关联较大,而中间分组则呈现出东、中、西部混合的状态.从拐点时间来看,大致分为2个阶段,1996年左右为大部分省份的拐点,2000年左右为部分中、西部资源省份的拐点,且这些省份转换速度较慢.%In order to analyze the development difference of the per capita net income of rural households in each provine of China, the changes in the per capita net income of rural households in 31 provinces were analyzed by LSTAR model since 1978, and the inflection point and conversion speed of the per capita net income were studied. As indicated by the results, the conversion speed of the farmers per capita net income was classified into five groups, the fastest and the lowest were largely related lo the economic development, while the middle groups were mixed together in the east, central and west provinces. The inflection time point was divided into two stages, the inflection point of most provinces was around 1996, while that of central and west provinces was around 2000, and the conversion speed of the central and west provinces was slow.

  9. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  10. Analysis of Jingdong Mall Logistics Distribution Model

    Science.gov (United States)

    Shao, Kang; Cheng, Feng

    In recent years, the development of electronic commerce in our country to speed up the pace. The role of logistics has been highlighted, more and more electronic commerce enterprise are beginning to realize the importance of logistics in the success or failure of the enterprise. In this paper, the author take Jingdong Mall for example, performing a SWOT analysis of their current situation of self-built logistics system, find out the problems existing in the current Jingdong Mall logistics distribution and give appropriate recommendations.

  11. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    Science.gov (United States)

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  12. A Spline Regression Model for Latent Variables

    Science.gov (United States)

    Harring, Jeffrey R.

    2014-01-01

    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  13. Assessing Lake Trophic Status: A Proportional Odds Logistic Regression Model

    Science.gov (United States)

    Lake trophic state classifications are good predictors of ecosystem condition and are indicative of both ecosystem services (e.g., recreation and aesthetics), and disservices (e.g., harmful algal blooms). Methods for classifying trophic state are based off the foundational work o...

  14. Analyzing Interactive QA Dialogues Using Logistic Regression Models

    Science.gov (United States)

    Kirschner, Manuel; Bernardi, Raffaella; Baroni, Marco; Dinh, Le Thanh

    With traditional Question Answering (QA) systems having reached nearly satisfactory performance, an emerging challenge is the development of successful Interactive Question Answering (IQA) systems. Important IQA subtasks are the identification of a dialogue-dependent typology of Follow Up Questions (FU Qs), automatic detection of the identified types, and the development of different context fusion strategies for each type. In this paper, we show how a system relying on shallow cues to similarity between utterances in a narrow dialogue context and other simple information sources, embedded in a machine learning framework, can improve FU Q answering performance by implicitly detecting different FU Q types and learning different context fusion strategies to help re-ranking their candidate answers.

  15. Fitting Additive Binomial Regression Models with the R Package blm

    Directory of Open Access Journals (Sweden)

    Stephanie Kovalchik

    2013-09-01

    Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.

  16. A Proposed Logistics Career Development Model.

    Science.gov (United States)

    1986-09-01

    Both Professor Peppers and Professor Demidovich called for pioneers in logistics dynam- ics. They stated that it was part of every logistics manag- er’s...Force Institute of Technology (AU), Wright-Patterson AFB OH, August 1967 (AD-A824 956). 12. Demidovich , John W. and Jerome G. Peppers, Jr. "The

  17. Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning

    Science.gov (United States)

    MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.

    2015-01-01

    Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…

  18. Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression

    Science.gov (United States)

    Colkesen, Ismail; Sahin, Emrehan Kutlug; Kavzoglu, Taskin

    2016-06-01

    Identification of landslide prone areas and production of accurate landslide susceptibility zonation maps have been crucial topics for hazard management studies. Since the prediction of susceptibility is one of the main processing steps in landslide susceptibility analysis, selection of a suitable prediction method plays an important role in the success of the susceptibility zonation process. Although simple statistical algorithms (e.g. logistic regression) have been widely used in the literature, the use of advanced non-parametric algorithms in landslide susceptibility zonation has recently become an active research topic. The main purpose of this study is to investigate the possible application of kernel-based Gaussian process regression (GPR) and support vector regression (SVR) for producing landslide susceptibility map of Tonya district of Trabzon, Turkey. Results of these two regression methods were compared with logistic regression (LR) method that is regarded as a benchmark method. Results showed that while kernel-based GPR and SVR methods generally produced similar results (90.46% and 90.37%, respectively), they outperformed the conventional LR method by about 18%. While confirming the superiority of the GPR method, statistical tests based on ROC statistics, success rate and prediction rate curves revealed the significant improvement in susceptibility map accuracy by applying kernel-based GPR and SVR methods.

  19. Characteristics of a Logistics-Based Business Model

    OpenAIRE

    Sandberg, Erik; Kihlén, Tobias; Abrahamsson, Mats

    2011-01-01

    In companies where excellence in logistics is decisive for the outperformance of competitors and logistics has an outspoken role for the strategy of the firm, there is present what we refer to here as a “logistics-based business model.” Based on a multiple case study of three Nordic retail companies, the purpose of this article is to explore the characteristics of such a logistics-based business model. As such, this research helps to provide structure to logistics-based business models and id...

  20. Area Logistics System Based on System Dynamics Model

    Institute of Scientific and Technical Information of China (English)

    GUI Shouping; ZHU Qiang; LU Lifang

    2005-01-01

    At present, there are few effective ways to analyze area logistics systems. This paper uses system dynamics to analyze the area logistics system and establishes a system dynamics model for the area logistics system based on the characteristics of the area logistics system and system dynamics. Numerical simulations with the system dynamic model were used to analyze a logistic system. Analysis of the Guangzhou economy shows that the model can reflect the actual state of the system objectively and can be used to make policy and harmonize environment.

  1. Constrained regression models for optimization and forecasting

    Directory of Open Access Journals (Sweden)

    P.J.S. Bruwer

    2003-12-01

    Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.

  2. A Mathematical Tool for Inference in Logistic Regression with Small-Sized Data Sets: A Practical Application on ISW-Ridge Relationships

    Directory of Open Access Journals (Sweden)

    Cheng-Wu Chen

    2008-11-01

    Full Text Available The general approach to modeling binary data for the purpose of estimating the propagation of an internal solitary wave (ISW is based on the maximum likelihood estimate (MLE method. In cases where the number of observations in the data is small, any inferences made based on the asymptotic distribution of changes in the deviance may be unreliable for binary data (the model's lack of fit is described in terms of a quantity known as the deviance. The deviance for the binary data is given by D. Collett (2003. may be unreliable for binary data. Logistic regression shows that the P-values for the likelihood ratio test and the score test are both <0.05. However, the null hypothesis is not rejected in the Wald test. The seeming discrepancies in P-values obtained between the Wald test and the other two tests are a sign that the large-sample approximation is not stable. We find that the parameters and the odds ratio estimates obtained via conditional exact logistic regression are different from those obtained via unconditional asymptotic logistic regression. Using exact results is a good idea when the sample size is small and the approximate P-values are <0.10. Thus in this study exact analysis is more appropriate.

  3. Research on financial crisis pre-warning model of listed company based on PCA and Logistic regression%基于主成分分析和Logistic回归的上市公司财务困境预警模型的研究

    Institute of Scientific and Technical Information of China (English)

    朱永忠; 姚烨; 张艳

    2012-01-01

    Using the financial data of listed companies, the model of pre-warning in financial crisis is constructed combining PCA and Logistic regression method with help of the SPSS and MATLAB software in this paper. The results show that the model has good predictive effect. The model which is established based on the training samples is tested and the prediction results are basically consistent with the actual data. The errors are acceptable. This shows that the results of the financial pre-warning are entirely consistent with the actual financial status. That means the model has very good forecast accuracy and practical promotion.%采用主成分分析与Logistic回归相结合,利用上市公司财务数据,借助SPSS和MATLAB软件,构建了上市公司财务困境预警模型.研究结果表明:该模型具有很好的预测效果,根据训练样本建立的模型,经过检验,其预测结果基本与实际相符,误差水平可以接受.这说明财务预警结果几乎完全符合其所处的财务状态,同时也说明了该模型具有很好的预测准确度和实用推广性.

  4. Comparison of logistic regression and neural network classifiers in the detection of hard exudates in retinal images.

    Science.gov (United States)

    Garcia, Maria; Valverde, Carmen; Lopez, Maria I; Poza, Jesus; Hornero, Roberto

    2013-01-01

    Diabetic Retinopathy (DR) is a common cause of visual impairment in industrialized countries. Automatic recognition of DR lesions in retinal images can contribute to the diagnosis and screening of this disease. The aim of this study is to automatically detect one of these lesions: hard exudates (EXs). Based on their properties, we extracted a set of features from image regions and selected the subset that best discriminated between EXs and the retinal background using logistic regression (LR). The LR model obtained, a multilayer perceptron (MLP) classifier and a radial basis function (RBF) classifier were subsequently used to obtain the final segmentation of EXs. Our database contained 130 images with variable color, brightness, and quality. Fifty of them were used to obtain the training examples. The remaining 80 images were used to test the performance of the method. The highest statistics were achieved for MLP or RBF. Using a lesion based criterion, our results reached a mean sensitivity of 95.9% (MLP) and a mean positive predictive value of 85.7% (RBF). With an image-based criterion, we achieved a 100% mean sensitivity, 87.5% mean specificity and 93.8% mean accuracy (MLP and RBF).

  5. Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression

    Science.gov (United States)

    Mousavi, S. Mostafa; Horton, Stephen, P.; Langston, Charles A.; Samei, Borhan

    2016-07-01

    We develop an automated strategy for discriminating deep microseismic events from shallow ones on the basis of the waveforms recorded on a limited number of surface receivers. Machine-learning techniques are employed to explore the relationship between event hypocenters and seismic features of the recorded signals in time, frequency, and time-frequency domains. We applied the technique to 440 microearthquakes -1.7deep and shallow events based on the knowledge gained from existing patterns. The cross validation test showed that events with depth shallower than 250 m can be discriminated from events with hypocentral depth between 1000 to 2000 m with 88% and 90.7% accuracy using logistic regression (LR) and artificial neural network (ANN) models, respectively. Similar results were obtained using single station seismograms. The results show that the spectral features have the highest correlation to source depth. Spectral centroids and 2D cross-correlations in the time-frequency domain are two new seismic features used in this study that showed to be promising measures for seismic event classification. The used machine learning techniques have application for efficient automatic classification of low energy signals recorded at one or more seismic stations.

  6. Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression

    Science.gov (United States)

    Mousavi, S. Mostafa; Horton, Stephen P.; Langston, Charles A.; Samei, Borhan

    2016-10-01

    We develop an automated strategy for discriminating deep microseismic events from shallow ones on the basis of the waveforms recorded on a limited number of surface receivers. Machine-learning techniques are employed to explore the relationship between event hypocentres and seismic features of the recorded signals in time, frequency and time-frequency domains. We applied the technique to 440 microearthquakes -1.7 train the system to discriminate between deep and shallow events based on the knowledge gained from existing patterns. The cross-validation test showed that events with depth shallower than 250 m can be discriminated from events with hypocentral depth between 1000 and 2000 m with 88 per cent and 90.7 per cent accuracy using logistic regression and artificial neural network models, respectively. Similar results were obtained using single station seismograms. The results show that the spectral features have the highest correlation to source depth. Spectral centroids and 2-D cross-correlations in the time-frequency domain are two new seismic features used in this study that showed to be promising measures for seismic event classification. The used machine-learning techniques have application for efficient automatic classification of low energy signals recorded at one or more seismic stations.

  7. To resuscitate or not to resuscitate: a logistic regression analysis of physician-related variables influencing the decision.

    Science.gov (United States)

    Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe

    2012-09-01

    To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.

  8. Logistics-based Competition : A Business Model Approach

    OpenAIRE

    Kihlén, Tobias

    2007-01-01

    Logistics is increasingly becoming recognised as a source of competitive advantage, both in practice and in academia. The possible strategic impact of logistics makes it important to gain deeper insight into the role of logistics in the strategy of the firm. There is however a considerable research gap between the quite abstract strategy theory and logistics research. A possible tool to use in bridging this gap is identified in business model research. Therefore, the purpose of this dissertat...

  9. Developing a Referral Protocol for Community-Based Occupational Therapy Services in Taiwan: A Logistic Regression Analysis.

    Science.gov (United States)

    Mao, Hui-Fen; Chang, Ling-Hui; Tsai, Athena Yi-Jung; Huang, Wen-Ni; Wang, Jye

    2016-01-01

    Because resources for long-term care services are limited, timely and appropriate referral for rehabilitation services is critical for optimizing clients' functions and successfully integrating them into the community. We investigated which client characteristics are most relevant in predicting Taiwan's community-based occupational therapy (OT) service referral based on experts' beliefs. Data were collected in face-to-face interviews using the Multidimensional Assessment Instrument (MDAI). Community-dwelling participants (n = 221) ≥ 18 years old who reported disabilities in the previous National Survey of Long-term Care Needs in Taiwan were enrolled. The standard for referral was the judgment and agreement of two experienced occupational therapists who reviewed the results of the MDAI. Logistic regressions and Generalized Additive Models were used for analysis. Two predictive models were proposed, one using basic activities of daily living (BADLs) and one using instrumental ADLs (IADLs). Dementia, psychiatric disorders, cognitive impairment, joint range-of-motion limitations, fear of falling, behavioral or emotional problems, expressive deficits (in the BADL-based model), and limitations in IADLs or BADLs were significantly correlated with the need for referral. Both models showed high area under the curve (AUC) values on receiver operating curve testing (AUC = 0.977 and 0.972, respectively). The probability of being referred for community OT services was calculated using the referral algorithm. The referral protocol facilitated communication between healthcare professionals to make appropriate decisions for OT referrals. The methods and findings should be useful for developing referral protocols for other long-term care services.

  10. A Skew-Normal Mixture Regression Model

    Science.gov (United States)

    Liu, Min; Lin, Tsung-I

    2014-01-01

    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  11. Modeling confounding by half-sibling regression

    DEFF Research Database (Denmark)

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun

    2016-01-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...

  12. Patterns and trends in occupational attainment of first jobs in the Netherlands, 1930–1995 : ordinary least squares regression versus conditional multinomial logistic regression

    NARCIS (Netherlands)

    Dessens, Jos A. G.; Jansen, Wim; Ganzeboom, Harry B. G.; Heijden, Peter G. M. van der

    2003-01-01

    This paper brings together the virtues of linear regression models for status attainment models formulated by second-generation social mobility researchers and the strengths of log-linear models formulated by third-generation researchers, into fourth-generation social mobility models, by using condi

  13. Modeling of Robust Design of Remanufacturing Logistics Networks

    Institute of Scientific and Technical Information of China (English)

    XIA Shou-chang; XI Li-feng

    2005-01-01

    The uncertainty of time, quantity and quality of recycling products leads to the bad stability and flexibility of remanufacturing logistics networks, while general design only covers the minimizing logistics cost, so robust design is presented to solve it. The mathematical model of remanufacturing logistics networks is built on the stochastic distribution of uncontrollable factors, and robust objectives are presented. The basic elements of robust design of remanufacturing logistics are redefined, and each part of mathematical model is explained in detail as well. Robust design of remanufacturing logistics networks is a problem of multi-objective optimization in essence.

  14. Tropical Forest Fire Susceptibility Mapping at the Cat Ba National Park Area, Hai Phong City, Vietnam, Using GIS-Based Kernel Logistic Regression

    Directory of Open Access Journals (Sweden)

    Dieu Tien Bui

    2016-04-01

    Full Text Available The Cat Ba National Park area (Vietnam with its tropical forest is recognized as being part of the world biodiversity conservation by the United Nations Educational, Scientific and Cultural Organization (UNESCO and is a well-known destination for tourists, with around 500,000 travelers per year. This area has been the site for many research projects; however, no project has been carried out for forest fire susceptibility assessment. Thus, protection of the forest including fire prevention is one of the main concerns of the local authorities. This work aims to produce a tropical forest fire susceptibility map for the Cat Ba National Park area, which may be helpful for the local authorities in forest fire protection management. To obtain this purpose, first, historical forest fires and related factors were collected from various sources to construct a GIS database. Then, a forest fire susceptibility model was developed using Kernel logistic regression. The quality of the model was assessed using the Receiver Operating Characteristic (ROC curve, area under the ROC curve (AUC, and five statistical evaluation measures. The usability of the resulting model is further compared with a benchmark model, the support vector machine (SVM. The results show that the Kernel logistic regression model has a high level of performance in both the training and validation dataset, with a prediction capability of 92.2%. Since the Kernel logistic regression model outperforms the benchmark model, we conclude that the proposed model is a promising alternative tool that should also be considered for forest fire susceptibility mapping in other areas. The results of this study are useful for the local authorities in forest planning and management.

  15. Confirming the validity of the CONUT system for early detection and monitoring of clinical undernutrition: comparison with two logistic regression models developed using SGA as the gold standard Confirmando la validez del sistema CONUT para la detección precoz de la desnutrición clínica: Comparación con dos modelos de regresión logística desarrollados usando el SGA como gold standard

    OpenAIRE

    A. González-Madroño; A. Mancha; F. J. Rodríguez; J. Culebras; J. I. de Ulibarri

    2012-01-01

    Aim: To ratify previous validations of the CONUT nutritional screening tool by the development of two probabilistic models using the parameters included in the CONUT, to see if the CONUT´s effectiveness could be improved. Methods: It is a two step prospective study. In Step 1, 101 patients were randomly selected, and SGA and CONUT was made. With data obtained an unconditional logistic regression model was developed, and two variants of CONUT were constructed: Model 1 was made by a method of l...

  16. Modified Logistic Regression Approaches to Eliminating the Impact of Response Styles on DIF Detection in Likert-Type Scales.

    Science.gov (United States)

    Chen, Hui-Fang; Jin, Kuan-Yu; Wang, Wen-Chung

    2017-01-01

    Extreme response styles (ERS) is prevalent in Likert- or rating-type data but previous research has not well-addressed their impact on differential item functioning (DIF) assessments. This study aimed to fill in the knowledge gap and examined their influence on the performances of logistic regression (LR) approaches in DIF detections, including the ordinal logistic regression (OLR) and the logistic discriminant functional analysis (LDFA). Results indicated that both the standard OLR and LDFA yielded severely inflated false positive rates as the magnitude of the differences in ERS increased between two groups. This study proposed a class of modified LR approaches to eliminating the ERS effect on DIF assessment. These proposed modifications showed satisfactory control of false positive rates when no DIF items existed and yielded a better control of false positive rates and more accurate true positive rates under DIF conditions than the conventional LR approaches did. In conclusion, the proposed modifications are recommended in survey research when there are multiple group or cultural groups.

  17. A Mathematical Model to Improve the Performance of Logistics Network

    Directory of Open Access Journals (Sweden)

    Muhammad Izman Herdiansyah

    2012-01-01

    Full Text Available The role of logistics nowadays is expanding from just providing transportation and warehousing to offering total integrated logistics. To remain competitive in the global market environment, business enterprises need to improve their logistics operations performance. The improvement will be achieved when we can provide a comprehensive analysis and optimize its network performances. In this paper, a mixed integer linier model for optimizing logistics network performance is developed. It provides a single-product multi-period multi-facilities model, as well as the multi-product concept. The problem is modeled in form of a network flow problem with the main objective to minimize total logistics cost. The problem can be solved using commercial linear programming package like CPLEX or LINDO. Even in small case, the solver in Excel may also be used to solve such model.Keywords: logistics network, integrated model, mathematical programming, network optimization

  18. A Mathematical Model to Improve the Performance of Logistics Network

    Directory of Open Access Journals (Sweden)

    Muhammad Izman Herdiansyah

    2012-01-01

    Full Text Available The role of logistics nowadays is expanding from just providing transportation and warehousing to offering total integrated logistics. To remain competitive in the global market environment, business enterprises need to improve their logistics operations performance. The improvement will be achieved when we can provide a comprehensive analysis and optimize its network performances. In this paper, a mixed integer linier model for optimizing logistics network performance is developed. It provides a single-product multi-period multi-facilities model, as well as the multi-product concept. The problem is modeled in form of a network flow problem with the main objective to minimize total logistics cost. The problem can be solved using commercial linear programming package like CPLEX or LINDO. Even in small case, the solver in Excel may also be used to solve such model.Keywords: logistics network, integrated model, mathematical programming, network optimization

  19. Bayesian multimodel inference for geostatistical regression models.

    Directory of Open Access Journals (Sweden)

    Devin S Johnson

    Full Text Available The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs. The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC. The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.

  20. Construction of hazard maps of Hantavirus contagion using Remote Sensing, logistic regression and Artificial Neural Networks: case Araucan\\'ia Region, Chile

    CERN Document Server

    Alvarez, G; Salinas, R

    2016-01-01

    In this research, methods and computational results based on statistical analysis and mathematical modelling, data collection in situ in order to make a hazard map of Hanta Virus infection in the region of Araucania, Chile are presented. The development of this work involves several elements such as Landsat satellite images, biological information regarding seropositivity of Hanta Virus and information concerning positive cases of infection detected in the region. All this information has been processed to find a function that models the danger of contagion in the region, through logistic regression analysis and Artificial Neural Networks

  1. Logistics chains in freight transport modelling

    NARCIS (Netherlands)

    Davydenko, I.

    2015-01-01

    The research presented in this PhD thesis has been motivated by the fact that the Netherlands, and the Randstad region in particular, are affected by the large transport flows and extensive operations of the logistics sector. These operations create welfare for those people who work in the sector, w

  2. Correlated noise in a logistic growth model

    Science.gov (United States)

    Ai, Bao-Quan; Wang, Xian-Ju; Liu, Guo-Tao; Liu, Liang-Gang

    2003-02-01

    The logistic differential equation is used to analyze cancer cell population, in the presence of a correlated Gaussian white noise. We study the steady state properties of tumor cell growth and discuss the effects of the correlated noise. It is found that the degree of correlation of the noise can cause tumor cell extinction.

  3. Logistics chains in freight transport modelling

    NARCIS (Netherlands)

    Davydenko, I.

    2015-01-01

    The research presented in this PhD thesis has been motivated by the fact that the Netherlands, and the Randstad region in particular, are affected by the large transport flows and extensive operations of the logistics sector. These operations create welfare for those people who work in the sector,

  4. An Alternative Flight Software Trigger Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions Using Inaccurate or Scarce Information

    Science.gov (United States)

    Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.

  5. An Alternative Flight Software Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions using Inaccurate or Scarce Information

    Science.gov (United States)

    Smith, Kelly; Gay, Robert; Stachowiak, Susan

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles

  6. Regression Models for Count Data in R

    Directory of Open Access Journals (Sweden)

    Christian Kleiber

    2008-06-01

    Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice.

  7. Parametric Regression Models Using Reversed Hazard Rates

    Directory of Open Access Journals (Sweden)

    Asokan Mulayath Variyath

    2014-01-01

    Full Text Available Proportional hazard regression models are widely used in survival analysis to understand and exploit the relationship between survival time and covariates. For left censored survival times, reversed hazard rate functions are more appropriate. In this paper, we develop a parametric proportional hazard rates model using an inverted Weibull distribution. The estimation and construction of confidence intervals for the parameters are discussed. We assess the performance of the proposed procedure based on a large number of Monte Carlo simulations. We illustrate the proposed method using a real case example.

  8. Bayesian model selection in Gaussian regression

    CERN Document Server

    Abramovich, Felix

    2009-01-01

    We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.

  9. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  10. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  11. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  12. Allelic drop-out probabilities estimated by logistic regression--Further considerations and practical implementation

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria

    2012-01-01

    We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al. [7] and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic dro...

  13. An Alternative Three-Parameter Logistic Item Response Model.

    Science.gov (United States)

    Pashley, Peter J.

    Birnbaum's three-parameter logistic function has become a common basis for item response theory modeling, especially within situations where significant guessing behavior is evident. This model is formed through a linear transformation of the two-parameter logistic function in order to facilitate a lower asymptote. This paper discusses an…

  14. City Logistics Modeling Efforts: Trends and Gaps - A Review

    NARCIS (Netherlands)

    Anand, N.R.; Quak, H.J.; Van Duin, J.H.R.; Tavasszy, L.A.

    2012-01-01

    In this paper, we present a review of city logistics modeling efforts reported in the literature for urban freight analysis. The review framework takes into account the diversity and complexity found in the present-day city logistics practice. Next, it covers the different aspects in the modeling se

  15. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  16. Using ordinal logistic regression to evaluate the performance of laser-Doppler predictions of burn-healing time

    Directory of Open Access Journals (Sweden)

    Pape Sarah A

    2009-02-01

    Full Text Available Abstract Background Laser-Doppler imaging (LDI of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA, site and cause of burn, day of LDI scan, burn center. It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are

  17. Using Logistic Regression and Random Forests multivariate statistical methods for landslide spatial probability assessment in North-Est Sicily, Italy

    Science.gov (United States)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-04-01

    first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.

  18. Hierarchical linear regression models for conditional quantiles

    Institute of Scientific and Technical Information of China (English)

    TIAN Maozai; CHEN Gemai

    2006-01-01

    The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.

  19. Regression Models For Saffron Yields in Iran

    Science.gov (United States)

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  20. Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a case-case study approach

    Directory of Open Access Journals (Sweden)

    Varga Csaba

    2012-10-01

    Full Text Available Abstract Background Identifying risk factors for Salmonella Enteritidis (SE infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68 and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94, after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors.

  1. Predicting macrofaunal species distribution in estuarine gradients using logistic regression and classification systems

    NARCIS (Netherlands)

    Ellis, J.; Ysebaert, T.; Hume, T.; Norkko, A.; Bult, T.; Herman, P.M.J.; Thrush, S.; Oldman, J.

    2006-01-01

    There is a growing need to predict ecological responses to long-term habitat change. However, statistical models for marine soft-substratum ecosystems are limited, and consequently there is a need for the development of such models. In order to assess the utility of statistical modelling approaches

  2. Allelic drop-out probabilities estimated by logistic regression--further considerations and practical implementation.

    Science.gov (United States)

    Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria; Mogensen, Helle Smidt; Morling, Niels

    2012-03-01

    We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al. [7] and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic drop-out is occurring. With this discussion, we hope to improve the drop-out model, so that it can be used for practical forensic genetics and stimulate further discussions. We discuss how to estimate drop-out probabilities when using a varying number of PCR cycles and other experimental conditions. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  3. Logistic Regression Analysis of Risk Factors for Stroke%脑卒中危险因素的Logistic分析

    Institute of Scientific and Technical Information of China (English)

    黄炜

    2014-01-01

    Objective To analyze risk factors for stroke of patients in our hospital,in order to prevent and control the occurrence of stroke.Methods Statistical analysis of single-factor and multivariate non-conditional Logistic regression were performed to analyze 133 cases of stroke patients and 87 cases of healthy people.Results Multivariate logistic regression analysis showed that risk factors for stroke of patients was associated with hypertension,diabetes,TIA,lipid abnormality and BMI in our hospital.Conclusion To do primary prevention of risk factors for stroke,which can reduce the incidence rate of stroke.%目的分析来我院就诊的脑卒中患者的危险因素,为有效预防脑卒中的发生提供临床经验。方法通过非条件Logistic回归对133例脑卒中患者和87例健康者进行分析比较。结果多因素Logistic回归分析(前进法)显示院我院脑卒中患者的危险因素与高血压、糖尿病、TIA史、异常血脂和BMI等有相关。结论积极做好脑卒中危险因素的一级预防,能够降低其发生率。

  4. A Time Scheduling Model of Logistics Service Supply Chain with Mass Customized Logistics Service

    Directory of Open Access Journals (Sweden)

    Weihua Liu

    2012-01-01

    Full Text Available With the increasing demand for customized logistics services in the manufacturing industry, the key factor in realizing the competitiveness of a logistics service supply chain (LSSC is whether it can meet specific requirements with the cost of mass service. In this case, in-depth research on the time-scheduling of LSSC is required. Setting the total cost, completion time, and the satisfaction of functional logistics service providers (FLSPs as optimal targets, this paper establishes a time scheduling model of LSSC, which is constrained by the service order time requirement. Numerical analysis is conducted by using Matlab 7.0 software. The effects of the relationship cost coefficient and the time delay coefficient on the comprehensive performance of LSSC are discussed. The results demonstrate that with the time scheduling model in mass-customized logistics services (MCLSs environment, the logistics service integrator (LSI can complete the order earlier or later than scheduled. With the increase of the relationship cost coefficient and the time delay coefficient, the comprehensive performance of LSSC also increases and tends towards stability. In addition, the time delay coefficient has a better effect in increasing the LSSC’s comprehensive performance than the relationship cost coefficient does.

  5. Analyzing the Administration Perception of the Teachers by Means of Logistic Regression According to Values

    Science.gov (United States)

    Ugurlu, Celal Teyyar

    2017-01-01

    This study aims to analyze the administration perception of the teachers according to values in line with certain parameters. The model of the research is relational screening model. The population is applied to 470 teachers who work in 25 secondary schools at the center of Sivas with scales. 317 questionnaires which had been returned have been…

  6. The logistic model-generated carrying capacities for wild herbivores ...

    African Journals Online (AJOL)

    Jesse

    Modelled as discrete-time logistic equations with fixed carrying capacities, it captures the wildlife herbivore population dynamics. Time series data, covering a period ..... Building Models for Conservation and. Wildlife Management. (New York ...

  7. Model and Calculation of Container Port Logistics Enterprises Efficiency Indexes

    Directory of Open Access Journals (Sweden)

    Xiao Hong

    2013-04-01

    Full Text Available The throughput of China’s container port is growing fast, but the earnings of inland port enterprises are not so good. Firstly ,the initial efficiency evaluation indexes of port logistics are reduced and screened by rough set model, and then logistics performance indexes weight are assigned by the rough totalitarian calculation method. As well, the rank of the indexes and the important indexes are picked up by combining with ABC management method. So the port logistics enterprises can monitor the key indexes to reduce cost and improve the efficiency of the logistics operations.

  8. Inferring gene regression networks with model trees

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesus S

    2010-10-01

    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  9. Building vulnerability to hydro-geomorphic hazards: Estimating damage probability from qualitative vulnerability assessment using logistic regression

    Science.gov (United States)

    Ettinger, Susanne; Mounaud, Loïc; Magill, Christina; Yao-Lafourcade, Anne-Françoise; Thouret, Jean-Claude; Manville, Vern; Negulescu, Caterina; Zuccaro, Giulio; De Gregorio, Daniela; Nardone, Stefano; Uchuchoque, Juan Alexis Luque; Arguedas, Anita; Macedo, Luisa; Manrique Llerena, Nélida

    2016-10-01

    bivariate analyses were applied to better characterize each vulnerability parameter. Multiple corresponding analyses revealed strong relationships between the "Distance to channel or bridges", "Structural building type", "Building footprint" and the observed damage. Logistic regression enabled quantification of the contribution of each explanatory parameter to potential damage, and determination of the significant parameters that express the damage susceptibility of a building. The model was applied 200 times on different calibration and validation data sets in order to examine performance. Results show that 90% of these tests have a success rate of more than 67%. Probabilities (at building scale) of experiencing different damage levels during a future event similar to the 8 February 2013 flash flood are the major outcomes of this study.

  10. Firm’s Characteristics as Predictors of Corporate Failure: Evidence from UK Companies using Logistic Regression

    OpenAIRE

    Hu, Cheng

    2012-01-01

    Analysis of credit risk and increased competition in financial market has improved the motivation of a prediction model of corporate failure. However, there are fewer researches dependent on UK companies. This paper builds up a prediction model of corporate by investigating firm’s characteristics. These characteristics include financial ratios and firm’s size. All financial data used in this study is collected from UK manufacturing companies over five years from 2003 to 2007. Based on the...

  11. 基于"3S"技术和LR-WSVM模型的广州市花都区滑坡灾害风险评估与区划%Landslide Hazard Risk Assessment and Zoning of Huadu District of Guangzhou Based on "3S" Technique and Logistic Regress-Weighted SVM Model

    Institute of Scientific and Technical Information of China (English)

    张春慧; 陈美招; 郑荣宝

    2015-01-01

    进行滑坡的风险评估与区划研究,能够为决策者科学制定防灾减灾政策提供帮助. 在外业调查及相关研究基础上,选择地形、岩性、植被、土地利用、降水、断裂带和人类活动等11个因子作为评价指标,采用Logistic回归-加权支持向量机模型进行滑坡的风险评价,划分5种风险等级类型,最后采取ROC曲线进行模型精度检验.结果表明,花都区梯面镇大部分区域、花东镇和赤坭镇部分地区是滑坡灾害风险较高的区域,评价结果与65个滑坡灾害存量数据空间分布相吻合;风险等级中很低、较低、中等、较高和很高区域面积比例分别为 28. 19%、31. 31%、25. 54%、11. 73%和3. 24%;受试者工作特证( ROC)曲线的精度验证表明,Logistic回归-加权支持向量机模型能有效进行该区域的滑坡风险评价,且具有较好的评价精度、分类能力和客观性.%Landslide is one of the three major natural hazards in China. It is, therefore, very important to study how to perform landslide risk assessment and zoning. The related studies may provide policymakers with theoretic basis in formula-ting strategies and policies for control of landslides. On the basis of the field surveys and relevant researches already done, 11 factors, such as terrain, lithology, vegetation, land use, precipitation, fault, human activities, etc. have beer used as evaluation indexes in performing landslide risk assessment for Huadu District with the aid of the logistic regression-weighted support vector machine model. Landslide risks in the region were sorted into 5 grades, and in the end the model in fitting accuracy with ROC curve has been verified. Results show that the risk was quite high in a large portion of Timian Town, and a certain portion of Huadong and Chini towns, which is spatially consistent with distribution of the 65 landslide disaster inventory data; The regions, very low, low, moderate, high and very high in risk

  12. 加权Logistic回归模型在火山岩型铜矿预测中的应用:以宁芜盆地中段为例%Application of the Weighted Logistic Regression Model in Prediction of Volcanic Rock-Hosted Copper Deposits-Taking the Middle Part of Ning-Wu Basin as an Example

    Institute of Scientific and Technical Information of China (English)

    赵增玉; 陈火根; 潘懋; 贾根; 李向前; 徐士银; 郭刚; 张祥云

    2016-01-01

    Application of the Weighted Logistic Regression model in prediction of volcanic rock type Copper deposits in the Middle part of Ning-Wu Basin is studied. First, the geological setting of ore-forming processes is analyzed. Three kinds of factors including geological body, structure and wall rock alteration are extracted based on the spatial distribution of copper deposits from the geologic map. Then, the spatial relationships between Copper mineral occurrence and each evidence factor are analyzed. It is suggested that Niangniangshan and Gushan volcanic edifice play an important role in spatial distributions of volcanic rock-hosted Copper deposits. The ten evidence raster layers including Longwangshan Formation, Gushan Formation, trachyte porphyry of Gushan volcanic edifice, monzonite porphyry of Niangniangshan volcanic edifice, buffers of the structure lines with NE, NW and EW trending, and the alteration areas of chalcopyrite, silicide and Limonite are selected. Finally, metallogenic probabilities are calculated using the Weighted Logistic Regression model. Four ore-forming prospects, including P1, P2, P3 and P4, are indicated based on the geological conditions of metallogenesis and model results. Among these prospecting areas, P1, P2 and P3, which are controlled by Niangniangshan and Gushan volcanic edifice, are spread in the northeast direction. P4 extends in the west-east direction and is controlled by Longwangshan volcanic edifice. The copper ore bodies are already found in these prospecting areas, suggesting that the results should be generally reliable.%文中探讨了加权Logistic回归模型在宁芜盆地中段火山岩型铜矿预测中的应用.首先,结合研究区的成矿地质背景,提取地质体、构造、围岩蚀变三大类证据因子;其次,分析各证据因子与铜矿点之间的空间关系,认为姑山旋回、娘娘山旋回火山机构控制了本区火山岩型铜矿的空间分布,根据计算结果,选取与火山岩型铜矿密

  13. Meteorological Factor Analysis of Freezing Injury to Overwintering Tea Based on Logistic Regression%基于 Logistic 回归的茶树越冬期冻害气象因素分析

    Institute of Scientific and Technical Information of China (English)

    段永春

    2015-01-01

    Thirty -one meteorological factors were chosed as dependent variable from 45 years of meteor-ological data in 3 major tea producing areas of Qingdao,Rizhao and Linyi in Shandong Provingce.Occurrence or not of the freezing injury during overwintering was chosed as independent variable.The single -factor logis-tic regression analysis was conducted,and 9 meteorological factors with statistical significance were chosed for multivariate logistic regression analysis.Then the logistic model for the heavy freezing injury occurrence of o-verwintering tea trees was established and evaluated.The results showed that the average air temperature in January,average air temperature in July of last year,rainfall in November of last year,average air temperature in November of last year,relative air humidity in February were the main determinants that caused the heavy freezing injury to the overwintering tea tree.%从山东日照、青岛、临沂三个主茶区45年的气象资料中,选择可能导致茶树越冬期大冻害形成的31个气象因子作为自变量,以越冬期大冻害发生有无作为因变量,进行单因素 Logistic 回归分析,从中选出9个有统计学意义的气象因子进行多因素 Logistic 回归分析,建立茶树越冬期大冻害发生的 Logistic 回归模型,并对模型进行评价。结果显示,1月平均气温、上年7月平均气温、上年11月降水量、上年11月平均气温、2月空气相对湿度五个气象因子决定了茶树越冬期大冻害的发生,其中1月平均气温是主要因子。

  14. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena

    DEFF Research Database (Denmark)

    Merlo, J; Chaix, B; Ohlsson, H

    2006-01-01

    STUDY OBJECTIVE: In social epidemiology, it is easy to compute and interpret measures of variation in multilevel linear regression, but technical difficulties exist in the case of logistic regression. The aim of this study was to present measures of variation appropriate for the logistic case...... in a didactic rather than a mathematical way. Design and PARTICIPANTS: Data were used from the health survey conducted in 2000 in the county of Scania, Sweden, that comprised 10 723 persons aged 18-80 years living in 60 areas. Conducting multilevel logistic regression different techniques were applied...... propensity areas with the area educational level. The sorting out index was equal to 82%. CONCLUSION: Measures of variation in logistic regression should be promoted in social epidemiological and public health research as efficient means of quantifying the importance of the context of residence...

  15. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena

    DEFF Research Database (Denmark)

    Merlo, J; Chaix, B; Ohlsson, H;

    2006-01-01

    in a didactic rather than a mathematical way. Design and PARTICIPANTS: Data were used from the health survey conducted in 2000 in the county of Scania, Sweden, that comprised 10 723 persons aged 18-80 years living in 60 areas. Conducting multilevel logistic regression different techniques were applied...... propensity areas with the area educational level. The sorting out index was equal to 82%. CONCLUSION: Measures of variation in logistic regression should be promoted in social epidemiological and public health research as efficient means of quantifying the importance of the context of residence......STUDY OBJECTIVE: In social epidemiology, it is easy to compute and interpret measures of variation in multilevel linear regression, but technical difficulties exist in the case of logistic regression. The aim of this study was to present measures of variation appropriate for the logistic case...

  16. 四肢战创伤院内死亡原因及危险因素的logistic回归分析%The causes of death and risk factors in pafients of war wound and trauma of extremities by logistic regression model

    Institute of Scientific and Technical Information of China (English)

    程昌志; 赵东海; 李全岳; 曲海燕; 陈伯成; 林舟丹

    2008-01-01

    目的 探讨四肢战创伤病例中院内死亡的死亡原因及危险因素.方法 对1968年至2002年在战伤救治过程中收治的352例四肢战创伤病例进行回顾性研究,对15例死亡病例尸体剖解结果进行分析以探讨其死亡原因,并采用logistic回归模型对死亡的危险因素进行了多因素统计学分析.以探讨影响死亡的危险因素.结果 352例在院病例中死亡15例,病死率为4.3%.死亡原因主要有急性肾功能衰竭(ARF)7例(46.7%),肺栓塞3例(20.0%),多器官功能衰竭(MOSF)2例(13.3%),气性坏疽3例(20.0%),其中气性坏疽合并ARFl例,合并MOSF 1例.死亡的主要危险因素有:休克、截肢(χ2=93.589、144.716,均P<0.05).结论 四肢战创伤病例主要死亡原因为ARF,对休克病人的及时正确处理以及截肢时机的正确选择有利于减少死亡率.%Objective To explore the causes of death and risk factors in patients of war wound and trauma of extremities.Methods This retrospective study involved 352 patients of war wound and trauma of extremities admitted to 303rd Hospital of People's Liberation Army during the period between 1968 to 2002.All the data were reviewed and the causes of death of 15 patients were analyzed by autopsy, and a computer's logistic regression model analysis was performed to approach the risk factors of death.Results Fifteen of the three hundred and fifty-two patients were died (4.3 % ).The causes of death included acute renal failure(ARF) (46.7%,7/15), lung embolism( 20.0%,3/15), clostridial myonecrosis(20.0%,3/15) and multiple organ system failure (MOSF)(13.3%, 2/15 ).In the univariate analysis, the risk of death increased by shock, time admitted to hospital, amputation,time of tourniquet,associated injury of head, thoracic region, abdomen or blood vessel (P < 0.05 ).In the logistic regression model analysis, shock and amputation were the two factors most strongly associated with the death of patients of war wound and trauma.(P < 0

  17. PREDICTION OF CORPORATE BANKRUPTCY IN ROMANIA THROUGH THE USE OF LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Brindescu-Olariu Daniel

    2013-07-01

    As theoretical contributions, the research proves that the companies that filed for bankruptcy during the crisis period showed signs of weaknesses before the beginning of the crisis. Financial ratios that show relevance in the prediction of corporate bankruptcy at local level have been identified and their correlation with the bankruptcy probability has been evaluated. The model is expected to maintain its accuracy with minimal or no additional calibration for companies from the entire Romanian economy that fit the profile of the target population.

  18. Analysis of Regionally Ecological Land Use and Its Influencing Factors Based on a Logistic Regression Model in the Beijing-Tianjin-Hebei Region, China%基于Logistic回归模型的区域生态用地演变影响因素分析——以京津冀地区为例

    Institute of Scientific and Technical Information of China (English)

    谢花林

    2011-01-01

    Land use and coverage change (LUCC) has been considered a very important factor in the study of environmental/ecological protection and global environmental changes. Ecological land is the essential component and resource for human being. Protecting ecological land, improving currently damaged ecological zones, and returning naturally ecological land are important issues for enhancing and balancing regionally ecological conditions. These issues are also meaningful for sustainable development and harmony between human and nature. This study quantified the impact factors and mechanisms of ecological land use change in this region, which would be helpful in providing strategies in improving regionally ecological security. The Beijing-Tianjin-Hebei region is China' s third largest economy after the development of "Yangtze river delta region" and "Pearl river delta region". Many factors, such as economic expansion and population growth, lead to a fact that our environment has been degraded and thus ecology is malfunctioned in some places. Due to the ignorance of efficient land use (e.g., urbanization, overexploitation of forests, and reclamation of land from lakes), most of forests and wetlands in the Beijing-Tianjin-Hebei region do not function properly, which have a great impact on the ecological system. The objectives of this study were to discuss how to build a logistic regression model to describe the land use change in the Beijing-Tianjin-Hebei region and explore to what extent this model can distinguish factors influencing land use change. This study examines the important variables of regional land coverage change by establishing logistic regression models in different periods in the Beijing-Tianjin-Hebei region. Main conclusions are as following. For forest land coverage change, "soil organic matter (SOM) contents", "gradients I (〈5°)", "distance to the nearest village", "distance to the nearest highway", and

  19. Understanding data in clinical research: a simple graphical display for plotting data (up to four independent variables) after binary logistic regression analysis.

    Science.gov (United States)

    Mesa, José Luis

    2004-01-01

    In clinical research, suitable visualization techniques of data after statistical analysis are crucial for the researches' and physicians' understanding. Common statistical techniques to analyze data in clinical research are logistic regression models. Among these, the application of binary logistic regression analysis (LRA) has greatly increased during past years, due to its diagnostic accuracy and because scientists often want to analyze in a dichotomous way whether some event will occur or not. Such an analysis lacks a suitable, understandable, and widely used graphical display, instead providing an understandable logit function based on a linear model for the natural logarithm of the odds in favor of the occurrence of the dependent variable, Y. By simple exponential transformation, such a logit equation can be transformed into a logistic function, resulting in predicted probabilities for the presence of the dependent variable, P(Y-1/X). This model can be used to generate a simple graphical display for binary LRA. For the case of a single predictor or explanatory (independent) variable, X, a plot can be generated with X represented by the abscissa (i.e., horizontal axis) and P(Y-1/X) represented by the ordinate (i.e., vertical axis). For the case of multiple predictor models, I propose here a relief 3D surface graphic in order to plot up to four independent variables (two continuous and two discrete). By using this technique, any researcher or physician would be able to transform a lesser understandable logit function into a figure easier to grasp, thus leading to a better knowledge and interpretation of data in clinical research. For this, a sophisticated statistical package is not necessary, because the graphical display may be generated by using any 2D or 3D surface plotter.

  20. 基于LSTR模型的个人卫生支出与经济增长关系研究%Estimation on the Relationship between Personal Health Expenditure and Economic Growth in China Based on the Logistic Smooth Transition Regression Model

    Institute of Scientific and Technical Information of China (English)

    唐波; 闫彬彬

    2014-01-01

    目的:刻画我国个人现金卫生支出与经济增长的非线性关系。方法:建立Logistic平滑转换模型,分析政府卫生支出对个人现金卫生支出与经济增长的非线性关系的影响。结果:经济增长对个人现金卫生支出的影响划分为3个阶段:1978-1996年处于高机制状态,1997-2008年处于机制转换时期,2009年至今处于低机制状态。结论:继续深化医药卫生体制改革,降低个人卫生支出负担。%Objective: To describe the non-linear relationship between out-of-pocket ( OOP ) payment and economic growth . Methods: Using logistic smooth transition regression model to analyze the impact of government health expenditure on the non-liner relationship between OOP payment and economic growth. Results:The impact of economic growth on OOP health expenditure is divided into 3 stages:from 1978 to 1996, the influence of economic growth on out-of-pocket payment approaches to highly-efficient mechanism operation;from 1997 to 2008 belongs to the transition period, it stays as low-efficient mechanism operation since 2009. Conclusion: To continue deepening the medical and health system reform and reduce the burden of personal health expenditure.