WorldWideScience

Sample records for binomial regression models

  1. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    Science.gov (United States)

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  2. Beta-binomial regression and bimodal utilization.

    Science.gov (United States)

    Liu, Chuan-Fen; Burgess, James F; Manning, Willard G; Maciejewski, Matthew L

    2013-10-01

    To illustrate how the analysis of bimodal U-shaped distributed utilization can be modeled with beta-binomial regression, which is rarely used in health services research. Veterans Affairs (VA) administrative data and Medicare claims in 2001-2004 for 11,123 Medicare-eligible VA primary care users in 2000. We compared means and distributions of VA reliance (the proportion of all VA/Medicare primary care visits occurring in VA) predicted from beta-binomial, binomial, and ordinary least-squares (OLS) models. Beta-binomial model fits the bimodal distribution of VA reliance better than binomial and OLS models due to the nondependence on normality and the greater flexibility in shape parameters. Increased awareness of beta-binomial regression may help analysts apply appropriate methods to outcomes with bimodal or U-shaped distributions. © Health Research and Educational Trust.

  3. Modeling Tetanus Neonatorum case using the regression of negative binomial and zero-inflated negative binomial

    Science.gov (United States)

    Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni

    2017-12-01

    Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.

  4. Application of Negative Binomial Regression for Assessing Public ...

    African Journals Online (AJOL)

    Because the variance was nearly two times greater than the mean, the negative binomial regression model provided an improved fit to the data and accounted better for overdispersion than the Poisson regression model, which assumed that the mean and variance are the same. The level of education and race were found

  5. Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

    Science.gov (United States)

    Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

    2017-06-01

    Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.

  6. [Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

    Science.gov (United States)

    Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

    2011-11-01

    To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.

  7. Marginalized zero-inflated negative binomial regression with application to dental caries.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Long, D Leann; Divaris, Kimon

    2016-05-10

    The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the finite sample performance of MZINB is compared with marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied in the evaluation of a school-based fluoride mouthrinse program on dental caries in 677 children. Copyright © 2015 John Wiley & Sons, Ltd.

  8. Reducing Monte Carlo error in the Bayesian estimation of risk ratios using log-binomial regression models.

    Science.gov (United States)

    Salmerón, Diego; Cano, Juan A; Chirlaque, María D

    2015-08-30

    In cohort studies, binary outcomes are very often analyzed by logistic regression. However, it is well known that when the goal is to estimate a risk ratio, the logistic regression is inappropriate if the outcome is common. In these cases, a log-binomial regression model is preferable. On the other hand, the estimation of the regression coefficients of the log-binomial model is difficult owing to the constraints that must be imposed on these coefficients. Bayesian methods allow a straightforward approach for log-binomial regression models and produce smaller mean squared errors in the estimation of risk ratios than the frequentist methods, and the posterior inferences can be obtained using the software WinBUGS. However, Markov chain Monte Carlo methods implemented in WinBUGS can lead to large Monte Carlo errors in the approximations to the posterior inferences because they produce correlated simulations, and the accuracy of the approximations are inversely related to this correlation. To reduce correlation and to improve accuracy, we propose a reparameterization based on a Poisson model and a sampling algorithm coded in R. Copyright © 2015 John Wiley & Sons, Ltd.

  9. Longitudinal beta-binomial modeling using GEE for overdispersed binomial data.

    Science.gov (United States)

    Wu, Hongqian; Zhang, Ying; Long, Jeffrey D

    2017-03-15

    Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often overdispersed. The standard generalized linear mixed effects model may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta-binomial model for overdispersed binomial data and estimate the regression parameters under a probit model using the generalized estimating equation method. A hybrid algorithm of the Fisher scoring and the method of moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally, the proposed method is applied to analyze functional impairment in subjects who are at risk of Huntington disease from a multisite observational study of prodromal Huntington disease. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Zero inflated Poisson and negative binomial regression models: application in education.

    Science.gov (United States)

    Salehi, Masoud; Roudbari, Masoud

    2015-01-01

    The number of failed courses and semesters in students are indicators of their performance. These amounts have zero inflated (ZI) distributions. Using ZI Poisson and negative binomial distributions we can model these count data to find the associated factors and estimate the parameters. This study aims at to investigate the important factors related to the educational performance of students. This cross-sectional study performed in 2008-2009 at Iran University of Medical Sciences (IUMS) with a population of almost 6000 students, 670 students selected using stratified random sampling. The educational and demographical data were collected using the University records. The study design was approved at IUMS and the students' data kept confidential. The descriptive statistics and ZI Poisson and negative binomial regressions were used to analyze the data. The data were analyzed using STATA. In the number of failed semesters, Poisson and negative binomial distributions with ZI, students' total average and quota system had the most roles. For the number of failed courses, total average, and being in undergraduate or master levels had the most effect in both models. In all models the total average have the most effect on the number of failed courses or semesters. The next important factor is quota system in failed semester and undergraduate and master levels in failed courses. Therefore, average has an important inverse effect on the numbers of failed courses and semester.

  11. Analysis of hypoglycemic events using negative binomial models.

    Science.gov (United States)

    Luo, Junxiang; Qu, Yongming

    2013-01-01

    Negative binomial regression is a standard model to analyze hypoglycemic events in diabetes clinical trials. Adjusting for baseline covariates could potentially increase the estimation efficiency of negative binomial regression. However, adjusting for covariates raises concerns about model misspecification, in which the negative binomial regression is not robust because of its requirement for strong model assumptions. In some literature, it was suggested to correct the standard error of the maximum likelihood estimator through introducing overdispersion, which can be estimated by the Deviance or Pearson Chi-square. We proposed to conduct the negative binomial regression using Sandwich estimation to calculate the covariance matrix of the parameter estimates together with Pearson overdispersion correction (denoted by NBSP). In this research, we compared several commonly used negative binomial model options with our proposed NBSP. Simulations and real data analyses showed that NBSP is the most robust to model misspecification, and the estimation efficiency will be improved by adjusting for baseline hypoglycemia. Copyright © 2013 John Wiley & Sons, Ltd.

  12. Geographically weighted negative binomial regression applied to zonal level safety performance models.

    Science.gov (United States)

    Gomes, Marcos José Timbó Lima; Cunto, Flávio; da Silva, Alan Ricardo

    2017-09-01

    Generalized Linear Models (GLM) with negative binomial distribution for errors, have been widely used to estimate safety at the level of transportation planning. The limited ability of this technique to take spatial effects into account can be overcome through the use of local models from spatial regression techniques, such as Geographically Weighted Poisson Regression (GWPR). Although GWPR is a system that deals with spatial dependency and heterogeneity and has already been used in some road safety studies at the planning level, it fails to account for the possible overdispersion that can be found in the observations on road-traffic crashes. Two approaches were adopted for the Geographically Weighted Negative Binomial Regression (GWNBR) model to allow discrete data to be modeled in a non-stationary form and to take note of the overdispersion of the data: the first examines the constant overdispersion for all the traffic zones and the second includes the variable for each spatial unit. This research conducts a comparative analysis between non-spatial global crash prediction models and spatial local GWPR and GWNBR at the level of traffic zones in Fortaleza/Brazil. A geographic database of 126 traffic zones was compiled from the available data on exposure, network characteristics, socioeconomic factors and land use. The models were calibrated by using the frequency of injury crashes as a dependent variable and the results showed that GWPR and GWNBR achieved a better performance than GLM for the average residuals and likelihood as well as reducing the spatial autocorrelation of the residuals, and the GWNBR model was more able to capture the spatial heterogeneity of the crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Comparison of beta-binomial regression model approaches to analyze health-related quality of life data.

    Science.gov (United States)

    Najera-Zuloaga, Josu; Lee, Dae-Jin; Arostegui, Inmaculada

    2017-01-01

    Health-related quality of life has become an increasingly important indicator of health status in clinical trials and epidemiological research. Moreover, the study of the relationship of health-related quality of life with patients and disease characteristics has become one of the primary aims of many health-related quality of life studies. Health-related quality of life scores are usually assumed to be distributed as binomial random variables and often highly skewed. The use of the beta-binomial distribution in the regression context has been proposed to model such data; however, the beta-binomial regression has been performed by means of two different approaches in the literature: (i) beta-binomial distribution with a logistic link; and (ii) hierarchical generalized linear models. None of the existing literature in the analysis of health-related quality of life survey data has performed a comparison of both approaches in terms of adequacy and regression parameter interpretation context. This paper is motivated by the analysis of a real data application of health-related quality of life outcomes in patients with Chronic Obstructive Pulmonary Disease, where the use of both approaches yields to contradictory results in terms of covariate effects significance and consequently the interpretation of the most relevant factors in health-related quality of life. We present an explanation of the results in both methodologies through a simulation study and address the need to apply the proper approach in the analysis of health-related quality of life survey data for practitioners, providing an R package.

  14. Two-part zero-inflated negative binomial regression model for quantitative trait loci mapping with count trait.

    Science.gov (United States)

    Moghimbeigi, Abbas

    2015-05-07

    Poisson regression models provide a standard framework for quantitative trait locus (QTL) mapping of count traits. In practice, however, count traits are often over-dispersed relative to the Poisson distribution. In these situations, the zero-inflated Poisson (ZIP), zero-inflated generalized Poisson (ZIGP) and zero-inflated negative binomial (ZINB) regression may be useful for QTL mapping of count traits. Added genetic variables to the negative binomial part equation, may also affect extra zero data. In this study, to overcome these challenges, I apply two-part ZINB model. The EM algorithm with Newton-Raphson method in the M-step uses for estimating parameters. An application of the two-part ZINB model for QTL mapping is considered to detect associations between the formation of gallstone and the genotype of markers. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Estimation of adjusted rate differences using additive negative binomial regression.

    Science.gov (United States)

    Donoghoe, Mark W; Marschner, Ian C

    2016-08-15

    Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Predicting Cumulative Incidence Probability by Direct Binomial Regression

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    Binomial modelling; cumulative incidence probability; cause-specific hazards; subdistribution hazard......Binomial modelling; cumulative incidence probability; cause-specific hazards; subdistribution hazard...

  17. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  18. [Using log-binomial model for estimating the prevalence ratio].

    Science.gov (United States)

    Ye, Rong; Gao, Yan-hui; Yang, Yi; Chen, Yue

    2010-05-01

    To estimate the prevalence ratios, using a log-binomial model with or without continuous covariates. Prevalence ratios for individuals' attitude towards smoking-ban legislation associated with smoking status, estimated by using a log-binomial model were compared with odds ratios estimated by logistic regression model. In the log-binomial modeling, maximum likelihood method was used when there were no continuous covariates and COPY approach was used if the model did not converge, for example due to the existence of continuous covariates. We examined the association between individuals' attitude towards smoking-ban legislation and smoking status in men and women. Prevalence ratio and odds ratio estimation provided similar results for the association in women since smoking was not common. In men however, the odds ratio estimates were markedly larger than the prevalence ratios due to a higher prevalence of outcome. The log-binomial model did not converge when age was included as a continuous covariate and COPY method was used to deal with the situation. All analysis was performed by SAS. Prevalence ratio seemed to better measure the association than odds ratio when prevalence is high. SAS programs were provided to calculate the prevalence ratios with or without continuous covariates in the log-binomial regression analysis.

  19. Chain binomial models and binomial autoregressive processes.

    Science.gov (United States)

    Weiss, Christian H; Pollett, Philip K

    2012-09-01

    We establish a connection between a class of chain-binomial models of use in ecology and epidemiology and binomial autoregressive (AR) processes. New results are obtained for the latter, including expressions for the lag-conditional distribution and related quantities. We focus on two types of chain-binomial model, extinction-colonization and colonization-extinction models, and present two approaches to parameter estimation. The asymptotic distributions of the resulting estimators are studied, as well as their finite-sample performance, and we give an application to real data. A connection is made with standard AR models, which also has implications for parameter estimation. © 2011, The International Biometric Society.

  20. Simulation on Poisson and negative binomial models of count road accident modeling

    Science.gov (United States)

    Sapuan, M. S.; Razali, A. M.; Zamzuri, Z. H.; Ibrahim, K.

    2016-11-01

    Accident count data have often been shown to have overdispersion. On the other hand, the data might contain zero count (excess zeros). The simulation study was conducted to create a scenarios which an accident happen in T-junction with the assumption the dependent variables of generated data follows certain distribution namely Poisson and negative binomial distribution with different sample size of n=30 to n=500. The study objective was accomplished by fitting Poisson regression, negative binomial regression and Hurdle negative binomial model to the simulated data. The model validation was compared and the simulation result shows for each different sample size, not all model fit the data nicely even though the data generated from its own distribution especially when the sample size is larger. Furthermore, the larger sample size indicates that more zeros accident count in the dataset.

  1. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  2. Standardized binomial models for risk or prevalence ratios and differences.

    Science.gov (United States)

    Richardson, David B; Kinlaw, Alan C; MacLehose, Richard F; Cole, Stephen R

    2015-10-01

    Epidemiologists often analyse binary outcomes in cohort and cross-sectional studies using multivariable logistic regression models, yielding estimates of adjusted odds ratios. It is widely known that the odds ratio closely approximates the risk or prevalence ratio when the outcome is rare, and it does not do so when the outcome is common. Consequently, investigators may decide to directly estimate the risk or prevalence ratio using a log binomial regression model. We describe the use of a marginal structural binomial regression model to estimate standardized risk or prevalence ratios and differences. We illustrate the proposed approach using data from a cohort study of coronary heart disease status in Evans County, Georgia, USA. The approach reduces problems with model convergence typical of log binomial regression by shifting all explanatory variables except the exposures of primary interest from the linear predictor of the outcome regression model to a model for the standardization weights. The approach also facilitates evaluation of departures from additivity in the joint effects of two exposures. Epidemiologists should consider reporting standardized risk or prevalence ratios and differences in cohort and cross-sectional studies. These are readily-obtained using the SAS, Stata and R statistical software packages. The proposed approach estimates the exposure effect in the total population. © The Author 2015; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.

  3. The Validation of a Beta-Binomial Model for Overdispersed Binomial Data.

    Science.gov (United States)

    Kim, Jongphil; Lee, Ji-Hyun

    2017-01-01

    The beta-binomial model has been widely used as an analytically tractable alternative that captures the overdispersion of an intra-correlated, binomial random variable, X . However, the model validation for X has been rarely investigated. As a beta-binomial mass function takes on a few different shapes, the model validation is examined for each of the classified shapes in this paper. Further, the mean square error (MSE) is illustrated for each shape by the maximum likelihood estimator (MLE) based on a beta-binomial model approach and the method of moments estimator (MME) in order to gauge when and how much the MLE is biased.

  4. Estimation Parameters And Modelling Zero Inflated Negative Binomial

    Directory of Open Access Journals (Sweden)

    Cindy Cahyaning Astuti

    2016-11-01

    Full Text Available Regression analysis is used to determine relationship between one or several response variable (Y with one or several predictor variables (X. Regression model between predictor variables and the Poisson distributed response variable is called Poisson Regression Model. Since, Poisson Regression requires an equality between mean and variance, it is not appropriate to apply this model on overdispersion (variance is higher than mean. Poisson regression model is commonly used to analyze the count data. On the count data type, it is often to encounteredd some observations that have zero value with large proportion of zero value on the response variable (zero Inflation. Poisson regression can be used to analyze count data but it has not been able to solve problem of excess zero value on the response variable. An alternative model which is more suitable for overdispersion data and can solve the problem of excess zero value on the response variable is Zero Inflated Negative Binomial (ZINB. In this research, ZINB is applied on the case of Tetanus Neonatorum in East Java. The aim of this research is to examine the likelihood function and to form an algorithm to estimate the parameter of ZINB and also applying ZINB model in the case of Tetanus Neonatorum in East Java. Maximum Likelihood Estimation (MLE method is used to estimate the parameter on ZINB and the likelihood function is maximized using Expectation Maximization (EM algorithm. Test results of ZINB regression model showed that the predictor variable have a partial significant effect at negative binomial model is the percentage of pregnant women visits and the percentage of maternal health personnel assisted, while the predictor variables that have a partial significant effect at zero inflation model is the percentage of neonatus visits.

  5. Using a Negative Binomial Regression Model for Early Warning at the Start of a Hand Foot Mouth Disease Epidemic in Dalian, Liaoning Province, China.

    Science.gov (United States)

    An, Qingyu; Wu, Jun; Fan, Xuesong; Pan, Liyang; Sun, Wei

    2016-01-01

    The hand foot and mouth disease (HFMD) is a human syndrome caused by intestinal viruses like that coxsackie A virus 16, enterovirus 71 and easily developed into outbreak in kindergarten and school. Scientifically and accurately early detection of the start time of HFMD epidemic is a key principle in planning of control measures and minimizing the impact of HFMD. The objective of this study was to establish a reliable early detection model for start timing of hand foot mouth disease epidemic in Dalian and to evaluate the performance of model by analyzing the sensitivity in detectability. The negative binomial regression model was used to estimate the weekly baseline case number of HFMD and identified the optimal alerting threshold between tested difference threshold values during the epidemic and non-epidemic year. Circular distribution method was used to calculate the gold standard of start timing of HFMD epidemic. From 2009 to 2014, a total of 62022 HFMD cases were reported (36879 males and 25143 females) in Dalian, Liaoning Province, China, including 15 fatal cases. The median age of the patients was 3 years. The incidence rate of epidemic year ranged from 137.54 per 100,000 population to 231.44 per 100,000population, the incidence rate of non-epidemic year was lower than 112 per 100,000 population. The negative binomial regression model with AIC value 147.28 was finally selected to construct the baseline level. The threshold value was 100 for the epidemic year and 50 for the non- epidemic year had the highest sensitivity(100%) both in retrospective and prospective early warning and the detection time-consuming was 2 weeks before the actual starting of HFMD epidemic. The negative binomial regression model could early warning the start of a HFMD epidemic with good sensitivity and appropriate detection time in Dalian.

  6. Robust inference in the negative binomial regression model with an application to falls data.

    Science.gov (United States)

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.

  7. Log-binomial models: exploring failed convergence.

    Science.gov (United States)

    Williamson, Tyler; Eliasziw, Misha; Fick, Gordon Hilton

    2013-12-13

    Relative risk is a summary metric that is commonly used in epidemiological investigations. Increasingly, epidemiologists are using log-binomial models to study the impact of a set of predictor variables on a single binary outcome, as they naturally offer relative risks. However, standard statistical software may report failed convergence when attempting to fit log-binomial models in certain settings. The methods that have been proposed in the literature for dealing with failed convergence use approximate solutions to avoid the issue. This research looks directly at the log-likelihood function for the simplest log-binomial model where failed convergence has been observed, a model with a single linear predictor with three levels. The possible causes of failed convergence are explored and potential solutions are presented for some cases. Among the principal causes is a failure of the fitting algorithm to converge despite the log-likelihood function having a single finite maximum. Despite these limitations, log-binomial models are a viable option for epidemiologists wishing to describe the relationship between a set of predictors and a binary outcome where relative risk is the desired summary measure. Epidemiologists are encouraged to continue to use log-binomial models and advocate for improvements to the fitting algorithms to promote the widespread use of log-binomial models.

  8. Spatial modeling of rat bites and prediction of rat infestation in Peshawar valley using binomial kriging with logistic regression.

    Science.gov (United States)

    Ali, Asad; Zaidi, Farrah; Fatima, Syeda Hira; Adnan, Muhammad; Ullah, Saleem

    2018-03-24

    In this study, we propose to develop a geostatistical computational framework to model the distribution of rat bite infestation of epidemic proportion in Peshawar valley, Pakistan. Two species Rattus norvegicus and Rattus rattus are suspected to spread the infestation. The framework combines strengths of maximum entropy algorithm and binomial kriging with logistic regression to spatially model the distribution of infestation and to determine the individual role of environmental predictors in modeling the distribution trends. Our results demonstrate the significance of a number of social and environmental factors in rat infestations such as (I) high human population density; (II) greater dispersal ability of rodents due to the availability of better connectivity routes such as roads, and (III) temperature and precipitation influencing rodent fecundity and life cycle.

  9. A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution.

    Science.gov (United States)

    Harrison, Xavier A

    2015-01-01

    Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE) to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial data. Finally, both OLRE and Beta-Binomial models performed

  10. Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

    Science.gov (United States)

    Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

    2015-12-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE

  11. Dynamic prediction of cumulative incidence functions by direct binomial regression.

    Science.gov (United States)

    Grand, Mia K; de Witte, Theo J M; Putter, Hein

    2018-03-25

    In recent years there have been a series of advances in the field of dynamic prediction. Among those is the development of methods for dynamic prediction of the cumulative incidence function in a competing risk setting. These models enable the predictions to be updated as time progresses and more information becomes available, for example when a patient comes back for a follow-up visit after completing a year of treatment, the risk of death, and adverse events may have changed since treatment initiation. One approach to model the cumulative incidence function in competing risks is by direct binomial regression, where right censoring of the event times is handled by inverse probability of censoring weights. We extend the approach by combining it with landmarking to enable dynamic prediction of the cumulative incidence function. The proposed models are very flexible, as they allow the covariates to have complex time-varying effects, and we illustrate how to investigate possible time-varying structures using Wald tests. The models are fitted using generalized estimating equations. The method is applied to bone marrow transplant data and the performance is investigated in a simulation study. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution

    Directory of Open Access Journals (Sweden)

    Xavier A. Harrison

    2015-07-01

    Full Text Available Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (<5 levels, I investigate the performance of both model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial

  13. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  14. Comparison of linear and zero-inflated negative binomial regression models for appraisal of risk factors associated with dental caries.

    Science.gov (United States)

    Batra, Manu; Shah, Aasim Farooq; Rajput, Prashant; Shah, Ishrat Aasim

    2016-01-01

    Dental caries among children has been described as a pandemic disease with a multifactorial nature. Various sociodemographic factors and oral hygiene practices are commonly tested for their influence on dental caries. In recent years, a recent statistical model that allows for covariate adjustment has been developed and is commonly referred zero-inflated negative binomial (ZINB) models. To compare the fit of the two models, the conventional linear regression (LR) model and ZINB model to assess the risk factors associated with dental caries. A cross-sectional survey was conducted on 1138 12-year-old school children in Moradabad Town, Uttar Pradesh during months of February-August 2014. Selected participants were interviewed using a questionnaire. Dental caries was assessed by recording decayed, missing, or filled teeth (DMFT) index. To assess the risk factor associated with dental caries in children, two approaches have been applied - LR model and ZINB model. The prevalence of caries-free subjects was 24.1%, and mean DMFT was 3.4 ± 1.8. In LR model, all the variables were statistically significant. Whereas in ZINB model, negative binomial part showed place of residence, father's education level, tooth brushing frequency, and dental visit statistically significant implying that the degree of being caries-free (DMFT = 0) increases for group of children who are living in urban, whose father is university pass out, who brushes twice a day and if have ever visited a dentist. The current study report that the LR model is a poorly fitted model and may lead to spurious conclusions whereas ZINB model has shown better goodness of fit (Akaike information criterion values - LR: 3.94; ZINB: 2.39) and can be preferred if high variance and number of an excess of zeroes are present.

  15. Distinguishing between Rural and Urban Road Segment Traffic Safety Based on Zero-Inflated Negative Binomial Regression Models

    Directory of Open Access Journals (Sweden)

    Xuedong Yan

    2012-01-01

    Full Text Available In this study, the traffic crash rate, total crash frequency, and injury and fatal crash frequency were taken into consideration for distinguishing between rural and urban road segment safety. The GIS-based crash data during four and half years in Pikes Peak Area, US were applied for the analyses. The comparative statistical results show that the crash rates in rural segments are consistently lower than urban segments. Further, the regression results based on Zero-Inflated Negative Binomial (ZINB regression models indicate that the urban areas have a higher crash risk in terms of both total crash frequency and injury and fatal crash frequency, compared to rural areas. Additionally, it is found that crash frequencies increase as traffic volume and segment length increase, though the higher traffic volume lower the likelihood of severe crash occurrence; compared to 2-lane roads, the 4-lane roads have lower crash frequencies but have a higher probability of severe crash occurrence; and better road facilities with higher free flow speed can benefit from high standard design feature thus resulting in a lower total crash frequency, but they cannot mitigate the severe crash risk.

  16. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    Science.gov (United States)

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  17. Comparison: Binomial model and Black Scholes model

    Directory of Open Access Journals (Sweden)

    Amir Ahmad Dar

    2018-03-01

    Full Text Available The Binomial Model and the Black Scholes Model are the popular methods that are used to solve the option pricing problems. Binomial Model is a simple statistical method and Black Scholes model requires a solution of a stochastic differential equation. Pricing of European call and a put option is a very difficult method used by actuaries. The main goal of this study is to differentiate the Binominal model and the Black Scholes model by using two statistical model - t-test and Tukey model at one period. Finally, the result showed that there is no significant difference between the means of the European options by using the above two models.

  18. Predictability and interpretability of hybrid link-level crash frequency models for urban arterials compared to cluster-based and general negative binomial regression models.

    Science.gov (United States)

    Najaf, Pooya; Duddu, Venkata R; Pulugurtha, Srinivas S

    2018-03-01

    Machine learning (ML) techniques have higher prediction accuracy compared to conventional statistical methods for crash frequency modelling. However, their black-box nature limits the interpretability. The objective of this research is to combine both ML and statistical methods to develop hybrid link-level crash frequency models with high predictability and interpretability. For this purpose, M5' model trees method (M5') is introduced and applied to classify the crash data and then calibrate a model for each homogenous class. The data for 1134 and 345 randomly selected links on urban arterials in the city of Charlotte, North Carolina was used to develop and validate models, respectively. The outputs from the hybrid approach are compared with the outputs from cluster-based negative binomial regression (NBR) and general NBR models. Findings indicate that M5' has high predictability and is very reliable to interpret the role of different attributes on crash frequency compared to other developed models.

  19. PENERAPAN REGRESI BINOMIAL NEGATIF UNTUK MENGATASI OVERDISPERSI PADA REGRESI POISSON

    Directory of Open Access Journals (Sweden)

    PUTU SUSAN PRADAWATI

    2013-09-01

    Full Text Available Poisson regression was used to analyze the count data which Poisson distributed. Poisson regression analysis requires state equidispersion, in which the mean value of the response variable is equal to the value of the variance. However, there are deviations in which the value of the response variable variance is greater than the mean. This is called overdispersion. If overdispersion happens and Poisson Regression analysis is being used, then underestimated standard errors will be obtained. Negative Binomial Regression can handle overdispersion because it contains a dispersion parameter. From the simulation data which experienced overdispersion in the Poisson Regression model it was found that the Negative Binomial Regression was better than the Poisson Regression model.

  20. Variability in results from negative binomial models for Lyme disease measured at different spatial scales.

    Science.gov (United States)

    Tran, Phoebe; Waller, Lance

    2015-01-01

    Lyme disease has been the subject of many studies due to increasing incidence rates year after year and the severe complications that can arise in later stages of the disease. Negative binomial models have been used to model Lyme disease in the past with some success. However, there has been little focus on the reliability and consistency of these models when they are used to study Lyme disease at multiple spatial scales. This study seeks to explore how sensitive/consistent negative binomial models are when they are used to study Lyme disease at different spatial scales (at the regional and sub-regional levels). The study area includes the thirteen states in the Northeastern United States with the highest Lyme disease incidence during the 2002-2006 period. Lyme disease incidence at county level for the period of 2002-2006 was linked with several previously identified key landscape and climatic variables in a negative binomial regression model for the Northeastern region and two smaller sub-regions (the New England sub-region and the Mid-Atlantic sub-region). This study found that negative binomial models, indeed, were sensitive/inconsistent when used at different spatial scales. We discuss various plausible explanations for such behavior of negative binomial models. Further investigation of the inconsistency and sensitivity of negative binomial models when used at different spatial scales is important for not only future Lyme disease studies and Lyme disease risk assessment/management but any study that requires use of this model type in a spatial context. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Perbandingan Regresi Binomial Negatif dan Regresi Conway-Maxwell-Poisson dalam Mengatasi Overdispersi pada Regresi Poisson

    Directory of Open Access Journals (Sweden)

    Lusi Eka Afri

    2017-03-01

    Full Text Available Regresi Binomial Negatif dan regresi Conway-Maxwell-Poisson merupakan solusi untuk mengatasi overdispersi pada regresi Poisson. Kedua model tersebut merupakan perluasan dari model regresi Poisson. Menurut Hinde dan Demetrio (2007, terdapat beberapa kemungkinan terjadi overdispersi pada regresi Poisson yaitu keragaman hasil pengamatan keragaman individu sebagai komponen yang tidak dijelaskan oleh model, korelasi antar respon individu, terjadinya pengelompokan dalam populasi dan peubah teramati yang dihilangkan. Akibatnya dapat menyebabkan pendugaan galat baku yang terlalu rendah dan akan menghasilkan pendugaan parameter yang bias ke bawah (underestimate. Penelitian ini bertujuan untuk membandingan model Regresi Binomial Negatif dan model regresi Conway-Maxwell-Poisson (COM-Poisson dalam mengatasi overdispersi pada data distribusi Poisson berdasarkan statistik uji devians. Data yang digunakan dalam penelitian ini terdiri dari dua sumber data yaitu data simulasi dan data kasus terapan. Data simulasi yang digunakan diperoleh dengan membangkitkan data berdistribusi Poisson yang mengandung overdispersi dengan menggunakan bahasa pemrograman R berdasarkan karakteristik data berupa , peluang munculnya nilai nol (p serta ukuran sampel (n. Data dibangkitkan berguna untuk mendapatkan estimasi koefisien parameter pada regresi binomial negatif dan COM-Poisson.   Kata Kunci: overdispersi, regresi binomial negatif, regresi Conway-Maxwell-Poisson Negative binomial regression and Conway-Maxwell-Poisson regression could be used to overcome over dispersion on Poisson regression. Both models are the extension of Poisson regression model. According to Hinde and Demetrio (2007, there will be some over dispersion on Poisson regression: observed variance in individual variance cannot be described by a model, correlation among individual response, and the population group and the observed variables are eliminated. Consequently, this can lead to low standard error

  2. Correcting for binomial measurement error in predictors in regression with application to analysis of DNA methylation rates by bisulfite sequencing.

    Science.gov (United States)

    Buonaccorsi, John; Prochenka, Agnieszka; Thoresen, Magne; Ploski, Rafal

    2016-09-30

    Motivated by a genetic application, this paper addresses the problem of fitting regression models when the predictor is a proportion measured with error. While the problem of dealing with additive measurement error in fitting regression models has been extensively studied, the problem where the additive error is of a binomial nature has not been addressed. The measurement errors here are heteroscedastic for two reasons; dependence on the underlying true value and changing sampling effort over observations. While some of the previously developed methods for treating additive measurement error with heteroscedasticity can be used in this setting, other methods need modification. A new version of simulation extrapolation is developed, and we also explore a variation on the standard regression calibration method that uses a beta-binomial model based on the fact that the true value is a proportion. Although most of the methods introduced here can be used for fitting non-linear models, this paper will focus primarily on their use in fitting a linear model. While previous work has focused mainly on estimation of the coefficients, we will, with motivation from our example, also examine estimation of the variance around the regression line. In addressing these problems, we also discuss the appropriate manner in which to bootstrap for both inferences and bias assessment. The various methods are compared via simulation, and the results are illustrated using our motivating data, for which the goal is to relate the methylation rate of a blood sample to the age of the individual providing the sample. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  3. [Application of detecting and taking overdispersion into account in Poisson regression model].

    Science.gov (United States)

    Bouche, G; Lepage, B; Migeot, V; Ingrand, P

    2009-08-01

    Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.

  4. Negative binomial multiplicity distribution from binomial cluster production

    International Nuclear Information System (INIS)

    Iso, C.; Mori, K.

    1990-01-01

    Two-step interpretation of negative binomial multiplicity distribution as a compound of binomial cluster production and negative binomial like cluster decay distribution is proposed. In this model we can expect the average multiplicity for the cluster production increases with increasing energy, different from a compound Poisson-Logarithmic distribution. (orig.)

  5. Penggunaan Model Binomial Pada Penentuan Harga Opsi Saham Karyawan

    Directory of Open Access Journals (Sweden)

    Dara Puspita Anggraeni

    2015-11-01

    Full Text Available Binomial Model for Valuing Employee Stock Options. Employee Stock Options (ESO differ from standard exchange-traded options. The three main differences in a valuation model for employee stock options : Vesting Period, Exit Rate and Non-Transferability. In this thesis, the model for valuing employee stock options discussed. This model are implement with a generalized binomial model.

  6. Tutorial on Using Regression Models with Count Outcomes Using R

    Directory of Open Access Journals (Sweden)

    A. Alexander Beaujean

    2016-02-01

    Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.

  7. Gaussian Process Regression Model in Spatial Logistic Regression

    Science.gov (United States)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  8. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments

    Science.gov (United States)

    2014-01-01

    Background Whole-genome bisulfite sequencing currently provides the highest-precision view of the epigenome, with quantitative information about populations of cells down to single nucleotide resolution. Several studies have demonstrated the value of this precision: meaningful features that correlate strongly with biological functions can be found associated with only a few CpG sites. Understanding the role of DNA methylation, and more broadly the role of DNA accessibility, requires that methylation differences between populations of cells are identified with extreme precision and in complex experimental designs. Results In this work we investigated the use of beta-binomial regression as a general approach for modeling whole-genome bisulfite data to identify differentially methylated sites and genomic intervals. Conclusions The regression-based analysis can handle medium- and large-scale experiments where it becomes critical to accurately model variation in methylation levels between replicates and account for influence of various experimental factors like cell types or batch effects. PMID:24962134

  9. Risk indicators of oral health status among young adults aged 18 years analyzed by negative binomial regression.

    Science.gov (United States)

    Lu, Hai-Xia; Wong, May Chun Mei; Lo, Edward Chin Man; McGrath, Colman

    2013-08-19

    Limited information on oral health status for young adults aged 18 year-olds is known, and no available data exists in Hong Kong. The aims of this study were to investigate the oral health status and its risk indicators among young adults in Hong Kong using negative binomial regression. A survey was conducted in a representative sample of Hong Kong young adults aged 18 years. Clinical examinations were taken to assess oral health status using DMFT index and Community Periodontal Index (CPI) according to WHO criteria. Negative binomial regressions for DMFT score and the number of sextants with healthy gums were performed to identify the risk indicators of oral health status. A total of 324 young adults were examined. Prevalence of dental caries experience among the subjects was 59% and the overall mean DMFT score was 1.4. Most subjects (95%) had a score of 2 as their highest CPI score. Negative binomial regression analyses revealed that subjects who had a dental visit within 3 years had significantly higher DMFT scores (IRR = 1.68, p < 0.001). Subjects who brushed their teeth more frequently (IRR = 1.93, p < 0.001) and those with better dental knowledge (IRR = 1.09, p = 0.002) had significantly more sextants with healthy gums. Dental caries experience of the young adults aged 18 years in Hong Kong was not high but their periodontal condition was unsatisfactory. Their oral health status was related to their dental visit behavior, oral hygiene habit, and oral health knowledge.

  10. Direct modeling of regression effects for transition probabilities in the progressive illness-death model

    DEFF Research Database (Denmark)

    Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo

    2017-01-01

    In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...

  11. Analysis of dental caries using generalized linear and count regression models

    Directory of Open Access Journals (Sweden)

    Javali M. Phil

    2013-11-01

    Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.

  12. Speech-discrimination scores modeled as a binomial variable.

    Science.gov (United States)

    Thornton, A R; Raffin, M J

    1978-09-01

    Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.

  13. The analysis of incontinence episodes and other count data in patients with overactive bladder by Poisson and negative binomial regression.

    Science.gov (United States)

    Martina, R; Kay, R; van Maanen, R; Ridder, A

    2015-01-01

    Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.

  14. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    Science.gov (United States)

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. A flexible mixed-effect negative binomial regression model for detecting unusual increases in MRI lesion counts in individual multiple sclerosis patients.

    Science.gov (United States)

    Kondo, Yumi; Zhao, Yinshan; Petkau, John

    2015-06-15

    We develop a new modeling approach to enhance a recently proposed method to detect increases of contrast-enhancing lesions (CELs) on repeated magnetic resonance imaging, which have been used as an indicator for potential adverse events in multiple sclerosis clinical trials. The method signals patients with unusual increases in CEL activity by estimating the probability of observing CEL counts as large as those observed on a patient's recent scans conditional on the patient's CEL counts on previous scans. This conditional probability index (CPI), computed based on a mixed-effect negative binomial regression model, can vary substantially depending on the choice of distribution for the patient-specific random effects. Therefore, we relax this parametric assumption to model the random effects with an infinite mixture of beta distributions, using the Dirichlet process, which effectively allows any form of distribution. To our knowledge, no previous literature considers a mixed-effect regression for longitudinal count variables where the random effect is modeled with a Dirichlet process mixture. As our inference is in the Bayesian framework, we adopt a meta-analytic approach to develop an informative prior based on previous clinical trials. This is particularly helpful at the early stages of trials when less data are available. Our enhanced method is illustrated with CEL data from 10 previous multiple sclerosis clinical trials. Our simulation study shows that our procedure estimates the CPI more accurately than parametric alternatives when the patient-specific random effect distribution is misspecified and that an informative prior improves the accuracy of the CPI estimates. Copyright © 2015 John Wiley & Sons, Ltd.

  16. On a Fractional Binomial Process

    Science.gov (United States)

    Cahoy, Dexter O.; Polito, Federico

    2012-02-01

    The classical binomial process has been studied by Jakeman (J. Phys. A 23:2815-2825, 1990) (and the references therein) and has been used to characterize a series of radiation states in quantum optics. In particular, he studied a classical birth-death process where the chance of birth is proportional to the difference between a larger fixed number and the number of individuals present. It is shown that at large times, an equilibrium is reached which follows a binomial process. In this paper, the classical binomial process is generalized using the techniques of fractional calculus and is called the fractional binomial process. The fractional binomial process is shown to preserve the binomial limit at large times while expanding the class of models that include non-binomial fluctuations (non-Markovian) at regular and small times. As a direct consequence, the generality of the fractional binomial model makes the proposed model more desirable than its classical counterpart in describing real physical processes. More statistical properties are also derived.

  17. Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach.

    Science.gov (United States)

    Chen, Yong; Hong, Chuan; Ning, Yang; Su, Xiao

    2016-01-15

    When conducting a meta-analysis of studies with bivariate binary outcomes, challenges arise when the within-study correlation and between-study heterogeneity should be taken into account. In this paper, we propose a marginal beta-binomial model for the meta-analysis of studies with binary outcomes. This model is based on the composite likelihood approach and has several attractive features compared with the existing models such as bivariate generalized linear mixed model (Chu and Cole, 2006) and Sarmanov beta-binomial model (Chen et al., 2012). The advantages of the proposed marginal model include modeling the probabilities in the original scale, not requiring any transformation of probabilities or any link function, having closed-form expression of likelihood function, and no constraints on the correlation parameter. More importantly, because the marginal beta-binomial model is only based on the marginal distributions, it does not suffer from potential misspecification of the joint distribution of bivariate study-specific probabilities. Such misspecification is difficult to detect and can lead to biased inference using currents methods. We compare the performance of the marginal beta-binomial model with the bivariate generalized linear mixed model and the Sarmanov beta-binomial model by simulation studies. Interestingly, the results show that the marginal beta-binomial model performs better than the Sarmanov beta-binomial model, whether or not the true model is Sarmanov beta-binomial, and the marginal beta-binomial model is more robust than the bivariate generalized linear mixed model under model misspecifications. Two meta-analyses of diagnostic accuracy studies and a meta-analysis of case-control studies are conducted for illustration. Copyright © 2015 John Wiley & Sons, Ltd.

  18. INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Yılmaz KAYA

    2012-06-01

    Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.

  19. Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

    Science.gov (United States)

    Bennett, Bradley C; Husby, Chad E

    2008-03-28

    Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.

  20. Binomial test models and item difficulty

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1979-01-01

    In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally

  1. An efficient binomial model-based measure for sequence comparison and its application.

    Science.gov (United States)

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  2. A mixed-binomial model for Likert-type personality measures.

    Science.gov (United States)

    Allik, Jüri

    2014-01-01

    Personality measurement is based on the idea that values on an unobservable latent variable determine the distribution of answers on a manifest response scale. Typically, it is assumed in the Item Response Theory (IRT) that latent variables are related to the observed responses through continuous normal or logistic functions, determining the probability with which one of the ordered response alternatives on a Likert-scale item is chosen. Based on an analysis of 1731 self- and other-rated responses on the 240 NEO PI-3 questionnaire items, it was proposed that a viable alternative is a finite number of latent events which are related to manifest responses through a binomial function which has only one parameter-the probability with which a given statement is approved. For the majority of items, the best fit was obtained with a mixed-binomial distribution, which assumes two different subpopulations who endorse items with two different probabilities. It was shown that the fit of the binomial IRT model can be improved by assuming that about 10% of random noise is contained in the answers and by taking into account response biases toward one of the response categories. It was concluded that the binomial response model for the measurement of personality traits may be a workable alternative to the more habitual normal and logistic IRT models.

  3. Correlated binomial models and correlation structures

    International Nuclear Information System (INIS)

    Hisakado, Masato; Kitsukawa, Kenji; Mori, Shintaro

    2006-01-01

    We discuss a general method to construct correlated binomial distributions by imposing several consistent relations on the joint probability function. We obtain self-consistency relations for the conditional correlations and conditional probabilities. The beta-binomial distribution is derived by a strong symmetric assumption on the conditional correlations. Our derivation clarifies the 'correlation' structure of the beta-binomial distribution. It is also possible to study the correlation structures of other probability distributions of exchangeable (homogeneous) correlated Bernoulli random variables. We study some distribution functions and discuss their behaviours in terms of their correlation structures

  4. Microbial comparative pan-genomics using binomial mixture models

    Directory of Open Access Journals (Sweden)

    Ussery David W

    2009-08-01

    Full Text Available Abstract Background The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. Results We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families in Buchnera aphidicola to large (around 43000 gene families in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population. Conclusion Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.

  5. Quantum Theory for the Binomial Model in Finance Thoery

    OpenAIRE

    Chen, Zeqian

    2001-01-01

    In this paper, a quantum model for the binomial market in finance is proposed. We show that its risk-neutral world exhibits an intriguing structure as a disk in the unit ball of ${\\bf R}^3,$ whose radius is a function of the risk-free interest rate with two thresholds which prevent arbitrage opportunities from this quantum market. Furthermore, from the quantum mechanical point of view we re-deduce the Cox-Ross-Rubinstein binomial option pricing formula by considering Maxwell-Boltzmann statist...

  6. Pooling overdispersed binomial data to estimate event rate.

    Science.gov (United States)

    Young-Xu, Yinong; Chan, K Arnold

    2008-08-19

    The beta-binomial model is one of the methods that can be used to validly combine event rates from overdispersed binomial data. Our objective is to provide a full description of this method and to update and broaden its applications in clinical and public health research. We describe the statistical theories behind the beta-binomial model and the associated estimation methods. We supply information about statistical software that can provide beta-binomial estimations. Using a published example, we illustrate the application of the beta-binomial model when pooling overdispersed binomial data. In an example regarding the safety of oral antifungal treatments, we had 41 treatment arms with event rates varying from 0% to 13.89%. Using the beta-binomial model, we obtained a summary event rate of 3.44% with a standard error of 0.59%. The parameters of the beta-binomial model took the values of 1.24 for alpha and 34.73 for beta. The beta-binomial model can provide a robust estimate for the summary event rate by pooling overdispersed binomial data from different studies. The explanation of the method and the demonstration of its applications should help researchers incorporate the beta-binomial method as they aggregate probabilities of events from heterogeneous studies.

  7. Pooling overdispersed binomial data to estimate event rate

    Directory of Open Access Journals (Sweden)

    Chan K Arnold

    2008-08-01

    Full Text Available Abstract Background The beta-binomial model is one of the methods that can be used to validly combine event rates from overdispersed binomial data. Our objective is to provide a full description of this method and to update and broaden its applications in clinical and public health research. Methods We describe the statistical theories behind the beta-binomial model and the associated estimation methods. We supply information about statistical software that can provide beta-binomial estimations. Using a published example, we illustrate the application of the beta-binomial model when pooling overdispersed binomial data. Results In an example regarding the safety of oral antifungal treatments, we had 41 treatment arms with event rates varying from 0% to 13.89%. Using the beta-binomial model, we obtained a summary event rate of 3.44% with a standard error of 0.59%. The parameters of the beta-binomial model took the values of 1.24 for alpha and 34.73 for beta. Conclusion The beta-binomial model can provide a robust estimate for the summary event rate by pooling overdispersed binomial data from different studies. The explanation of the method and the demonstration of its applications should help researchers incorporate the beta-binomial method as they aggregate probabilities of events from heterogeneous studies.

  8. A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data.

    Science.gov (United States)

    Smith, Gregory R; Birtwistle, Marc R

    2016-01-01

    A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes.

  9. An empirical tool to evaluate the safety of cyclists: Community based, macro-level collision prediction models using negative binomial regression.

    Science.gov (United States)

    Wei, Feng; Lovegrove, Gordon

    2013-12-01

    Today, North American governments are more willing to consider compact neighborhoods with increased use of sustainable transportation modes. Bicycling, one of the most effective modes for short trips with distances less than 5km is being encouraged. However, as vulnerable road users (VRUs), cyclists are more likely to be injured when involved in collisions. In order to create a safe road environment for them, evaluating cyclists' road safety at a macro level in a proactive way is necessary. In this paper, different generalized linear regression methods for collision prediction model (CPM) development are reviewed and previous studies on micro-level and macro-level bicycle-related CPMs are summarized. On the basis of insights gained in the exploration stage, this paper also reports on efforts to develop negative binomial models for bicycle-auto collisions at a community-based, macro-level. Data came from the Central Okanagan Regional District (CORD), of British Columbia, Canada. The model results revealed two types of statistical associations between collisions and each explanatory variable: (1) An increase in bicycle-auto collisions is associated with an increase in total lane kilometers (TLKM), bicycle lane kilometers (BLKM), bus stops (BS), traffic signals (SIG), intersection density (INTD), and arterial-local intersection percentage (IALP). (2) A decrease in bicycle collisions was found to be associated with an increase in the number of drive commuters (DRIVE), and in the percentage of drive commuters (DRP). These results support our hypothesis that in North America, with its current low levels of bicycle use (macro-level CPMs. Copyright © 2012. Published by Elsevier Ltd.

  10. On extinction time of a generalized endemic chain-binomial model.

    Science.gov (United States)

    Aydogmus, Ozgur

    2016-09-01

    We considered a chain-binomial epidemic model not conferring immunity after infection. Mean field dynamics of the model has been analyzed and conditions for the existence of a stable endemic equilibrium are determined. The behavior of the chain-binomial process is probabilistically linked to the mean field equation. As a result of this link, we were able to show that the mean extinction time of the epidemic increases at least exponentially as the population size grows. We also present simulation results for the process to validate our analytical findings. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Discrimination of numerical proportions: A comparison of binomial and Gaussian models.

    Science.gov (United States)

    Raidvee, Aire; Lember, Jüri; Allik, Jüri

    2017-01-01

    Observers discriminated the numerical proportion of two sets of elements (N = 9, 13, 33, and 65) that differed either by color or orientation. According to the standard Thurstonian approach, the accuracy of proportion discrimination is determined by irreducible noise in the nervous system that stochastically transforms the number of presented visual elements onto a continuum of psychological states representing numerosity. As an alternative to this customary approach, we propose a Thurstonian-binomial model, which assumes discrete perceptual states, each of which is associated with a certain visual element. It is shown that the probability β with which each visual element can be noticed and registered by the perceptual system can explain data of numerical proportion discrimination at least as well as the continuous Thurstonian-Gaussian model, and better, if the greater parsimony of the Thurstonian-binomial model is taken into account using AIC model selection. We conclude that Gaussian and binomial models represent two different fundamental principles-internal noise vs. using only a fraction of available information-which are both plausible descriptions of visual perception.

  12. Statistical inference involving binomial and negative binomial parameters.

    Science.gov (United States)

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2009-05-01

    Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.

  13. A non-linear beta-binomial regression model for mapping EORTC QLQ- C30 to the EQ-5D-3L in lung cancer patients: a comparison with existing approaches.

    Science.gov (United States)

    Khan, Iftekhar; Morris, Stephen

    2014-11-12

    The performance of the Beta Binomial (BB) model is compared with several existing models for mapping the EORTC QLQ-C30 (QLQ-C30) on to the EQ-5D-3L using data from lung cancer trials. Data from 2 separate non small cell lung cancer clinical trials (TOPICAL and SOCCAR) are used to develop and validate the BB model. Comparisons with Linear, TOBIT, Quantile, Quadratic and CLAD models are carried out. The mean prediction error, R(2), proportion predicted outside the valid range, clinical interpretation of coefficients, model fit and estimation of Quality Adjusted Life Years (QALY) are reported and compared. Monte-Carlo simulation is also used. The Beta-Binomial regression model performed 'best' among all models. For TOPICAL and SOCCAR trials, respectively, residual mean square error (RMSE) was 0.09 and 0.11; R(2) was 0.75 and 0.71; observed vs. predicted means were 0.612 vs. 0.608 and 0.750 vs. 0.749. Mean difference in QALY's (observed vs. predicted) were 0.051 vs. 0.053 and 0.164 vs. 0.162 for TOPICAL and SOCCAR respectively. Models tested on independent data show simulated 95% confidence from the BB model containing the observed mean more often (77% and 59% for TOPICAL and SOCCAR respectively) compared to the other models. All algorithms over-predict at poorer health states but the BB model was relatively better, particularly for the SOCCAR data. The BB model may offer superior predictive properties amongst mapping algorithms considered and may be more useful when predicting EQ-5D-3L at poorer health states. We recommend the algorithm derived from the TOPICAL data due to better predictive properties and less uncertainty.

  14. Modeling and Predistortion of Envelope Tracking Power Amplifiers using a Memory Binomial Model

    DEFF Research Database (Denmark)

    Tafuri, Felice Francesco; Sira, Daniel; Larsen, Torben

    2013-01-01

    . The model definition is based on binomial series, hence the name of memory binomial model (MBM). The MBM is here applied to measured data-sets acquired from an ET measurement set-up. When used as a PA model the MBM showed an NMSE (Normalized Mean Squared Error) as low as −40dB and an ACEPR (Adjacent Channel...... Error Power Ratio) below −51 dB. The simulated predistortion results showed that the MBM can improve the compensation of distortion in the adjacent channel of 5.8 dB and 5.7 dB compared to a memory polynomial predistorter (MPPD). The predistortion performance in the time domain showed an NMSE...

  15. Selecting Tools to Model Integer and Binomial Multiplication

    Science.gov (United States)

    Pratt, Sarah Smitherman; Eddy, Colleen M.

    2017-01-01

    Mathematics teachers frequently provide concrete manipulatives to students during instruction; however, the rationale for using certain manipulatives in conjunction with concepts may not be explored. This article focuses on area models that are currently used in classrooms to provide concrete examples of integer and binomial multiplication. The…

  16. A comparison of different ways of including baseline counts in negative binomial models for data from falls prevention trials.

    Science.gov (United States)

    Zheng, Han; Kimber, Alan; Goodwin, Victoria A; Pickering, Ruth M

    2018-01-01

    A common design for a falls prevention trial is to assess falling at baseline, randomize participants into an intervention or control group, and ask them to record the number of falls they experience during a follow-up period of time. This paper addresses how best to include the baseline count in the analysis of the follow-up count of falls in negative binomial (NB) regression. We examine the performance of various approaches in simulated datasets where both counts are generated from a mixed Poisson distribution with shared random subject effect. Including the baseline count after log-transformation as a regressor in NB regression (NB-logged) or as an offset (NB-offset) resulted in greater power than including the untransformed baseline count (NB-unlogged). Cook and Wei's conditional negative binomial (CNB) model replicates the underlying process generating the data. In our motivating dataset, a statistically significant intervention effect resulted from the NB-logged, NB-offset, and CNB models, but not from NB-unlogged, and large, outlying baseline counts were overly influential in NB-unlogged but not in NB-logged. We conclude that there is little to lose by including the log-transformed baseline count in standard NB regression compared to CNB for moderate to larger sized datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. A Bayesian Approach to Functional Mixed Effect Modeling for Longitudinal Data with Binomial Outcomes

    Science.gov (United States)

    Kliethermes, Stephanie; Oleson, Jacob

    2014-01-01

    Longitudinal growth patterns are routinely seen in medical studies where individual and population growth is followed over a period of time. Many current methods for modeling growth presuppose a parametric relationship between the outcome and time (e.g., linear, quadratic); however, these relationships may not accurately capture growth over time. Functional mixed effects (FME) models provide flexibility in handling longitudinal data with nonparametric temporal trends. Although FME methods are well-developed for continuous, normally distributed outcome measures, nonparametric methods for handling categorical outcomes are limited. We consider the situation with binomially distributed longitudinal outcomes. Although percent correct data can be modeled assuming normality, estimates outside the parameter space are possible and thus estimated curves can be unrealistic. We propose a binomial FME model using Bayesian methodology to account for growth curves with binomial (percentage) outcomes. The usefulness of our methods is demonstrated using a longitudinal study of speech perception outcomes from cochlear implant users where we successfully model both the population and individual growth trajectories. Simulation studies also advocate the usefulness of the binomial model particularly when outcomes occur near the boundary of the probability parameter space and in situations with a small number of trials. PMID:24723495

  18. Beta-binomial model for meta-analysis of odds ratios.

    Science.gov (United States)

    Bakbergenuly, Ilyas; Kulinskaya, Elena

    2017-05-20

    In meta-analysis of odds ratios (ORs), heterogeneity between the studies is usually modelled via the additive random effects model (REM). An alternative, multiplicative REM for ORs uses overdispersion. The multiplicative factor in this overdispersion model (ODM) can be interpreted as an intra-class correlation (ICC) parameter. This model naturally arises when the probabilities of an event in one or both arms of a comparative study are themselves beta-distributed, resulting in beta-binomial distributions. We propose two new estimators of the ICC for meta-analysis in this setting. One is based on the inverted Breslow-Day test, and the other on the improved gamma approximation by Kulinskaya and Dollinger (2015, p. 26) to the distribution of Cochran's Q. The performance of these and several other estimators of ICC on bias and coverage is studied by simulation. Additionally, the Mantel-Haenszel approach to estimation of ORs is extended to the beta-binomial model, and we study performance of various ICC estimators when used in the Mantel-Haenszel or the inverse-variance method to combine ORs in meta-analysis. The results of the simulations show that the improved gamma-based estimator of ICC is superior for small sample sizes, and the Breslow-Day-based estimator is the best for n⩾100. The Mantel-Haenszel-based estimator of OR is very biased and is not recommended. The inverse-variance approach is also somewhat biased for ORs≠1, but this bias is not very large in practical settings. Developed methods and R programs, provided in the Web Appendix, make the beta-binomial model a feasible alternative to the standard REM for meta-analysis of ORs. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  19. A Bayesian approach to functional mixed-effects modeling for longitudinal data with binomial outcomes.

    Science.gov (United States)

    Kliethermes, Stephanie; Oleson, Jacob

    2014-08-15

    Longitudinal growth patterns are routinely seen in medical studies where individual growth and population growth are followed up over a period of time. Many current methods for modeling growth presuppose a parametric relationship between the outcome and time (e.g., linear and quadratic); however, these relationships may not accurately capture growth over time. Functional mixed-effects (FME) models provide flexibility in handling longitudinal data with nonparametric temporal trends. Although FME methods are well developed for continuous, normally distributed outcome measures, nonparametric methods for handling categorical outcomes are limited. We consider the situation with binomially distributed longitudinal outcomes. Although percent correct data can be modeled assuming normality, estimates outside the parameter space are possible, and thus, estimated curves can be unrealistic. We propose a binomial FME model using Bayesian methodology to account for growth curves with binomial (percentage) outcomes. The usefulness of our methods is demonstrated using a longitudinal study of speech perception outcomes from cochlear implant users where we successfully model both the population and individual growth trajectories. Simulation studies also advocate the usefulness of the binomial model particularly when outcomes occur near the boundary of the probability parameter space and in situations with a small number of trials. Copyright © 2014 John Wiley & Sons, Ltd.

  20. Jumps in binomial AR(1) processes

    OpenAIRE

    Weiß , Christian H.

    2009-01-01

    Abstract We consider the binomial AR(1) model for serially dependent processes of binomial counts. After a review of its definition and known properties, we investigate marginal and serial properties of jumps in such processes. Based on these results, we propose the jumps control chart for monitoring a binomial AR(1) process. We show how to evaluate the performance of this control chart and give design recommendations. correspondance: Tel.: +49 931 31 84968; ...

  1. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

    Science.gov (United States)

    Yelland, Lisa N; Salter, Amy B; Ryan, Philip

    2011-10-15

    Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.

  2. Correlation Structures of Correlated Binomial Models and Implied Default Distribution

    OpenAIRE

    S. Mori; K. Kitsukawa; M. Hisakado

    2006-01-01

    We show how to analyze and interpret the correlation structures, the conditional expectation values and correlation coefficients of exchangeable Bernoulli random variables. We study implied default distributions for the iTraxx-CJ tranches and some popular probabilistic models, including the Gaussian copula model, Beta binomial distribution model and long-range Ising model. We interpret the differences in their profiles in terms of the correlation structures. The implied default distribution h...

  3. Poissonian and binomial models in radionuclide metrology by liquid scintillation counting

    International Nuclear Information System (INIS)

    Grau Malonda, A.

    1990-01-01

    Binomial and Poissonian models developed for calculating the counting efficiency from a free parameter is analysed in this paper. This model have been applied to liquid scintillator counting systems with two or three photomultipliers. It is mathematically demostrated that both models are equivalent and that the counting efficiencies calculated either from one or the other model are identical. (Author)

  4. Penalized estimation for competing risks regression with applications to high-dimensional covariates

    DEFF Research Database (Denmark)

    Ambrogi, Federico; Scheike, Thomas H.

    2016-01-01

    of competing events. The direct binomial regression model of Scheike and others (2008. Predicting cumulative incidence probability by direct binomial regression. Biometrika 95: (1), 205-220) is reformulated in a penalized framework to possibly fit a sparse regression model. The developed approach is easily...... Research 19: (1), 29-51), the research regarding competing risks is less developed (Binder and others, 2009. Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25: (7), 890-896). The aim of this work is to consider how to do penalized regression in the presence...... implementable using existing high-performance software to do penalized regression. Results from simulation studies are presented together with an application to genomic data when the endpoint is progression-free survival. An R function is provided to perform regularized competing risks regression according...

  5. Bipartite binomial heaps

    DEFF Research Database (Denmark)

    Elmasry, Amr; Jensen, Claus; Katajainen, Jyrki

    2017-01-01

    the (total) number of elements stored in the data structure(s) prior to the operation. As the resulting data structure consists of two components that are different variants of binomial heaps, we call it a bipartite binomial heap. Compared to its counterpart, a multipartite binomial heap, the new structure...

  6. Detecting non-binomial sex allocation when developmental mortality operates.

    Science.gov (United States)

    Wilkinson, Richard D; Kapranas, Apostolos; Hardy, Ian C W

    2016-11-07

    Optimal sex allocation theory is one of the most intricately developed areas of evolutionary ecology. Under a range of conditions, particularly under population sub-division, selection favours sex being allocated to offspring non-randomly, generating non-binomial variances of offspring group sex ratios. Detecting non-binomial sex allocation is complicated by stochastic developmental mortality, as offspring sex can often only be identified on maturity with the sex of non-maturing offspring remaining unknown. We show that current approaches for detecting non-binomiality have limited ability to detect non-binomial sex allocation when developmental mortality has occurred. We present a new procedure using an explicit model of sex allocation and mortality and develop a Bayesian model selection approach (available as an R package). We use the double and multiplicative binomial distributions to model over- and under-dispersed sex allocation and show how to calculate Bayes factors for comparing these alternative models to the null hypothesis of binomial sex allocation. The ability to detect non-binomial sex allocation is greatly increased, particularly in cases where mortality is common. The use of Bayesian methods allows for the quantification of the evidence in favour of each hypothesis, and our modelling approach provides an improved descriptive capability over existing approaches. We use a simulation study to demonstrate substantial improvements in power for detecting non-binomial sex allocation in situations where current methods fail, and we illustrate the approach in real scenarios using empirically obtained datasets on the sexual composition of groups of gregarious parasitoid wasps. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Negative binomial models for abundance estimation of multiple closed populations

    Science.gov (United States)

    Boyce, Mark S.; MacKenzie, Darry I.; Manly, Bryan F.J.; Haroldson, Mark A.; Moody, David W.

    2001-01-01

    Counts of uniquely identified individuals in a population offer opportunities to estimate abundance. However, for various reasons such counts may be burdened by heterogeneity in the probability of being detected. Theoretical arguments and empirical evidence demonstrate that the negative binomial distribution (NBD) is a useful characterization for counts from biological populations with heterogeneity. We propose a method that focuses on estimating multiple populations by simultaneously using a suite of models derived from the NBD. We used this approach to estimate the number of female grizzly bears (Ursus arctos) with cubs-of-the-year in the Yellowstone ecosystem, for each year, 1986-1998. Akaike's Information Criteria (AIC) indicated that a negative binomial model with a constant level of heterogeneity across all years was best for characterizing the sighting frequencies of female grizzly bears. A lack-of-fit test indicated the model adequately described the collected data. Bootstrap techniques were used to estimate standard errors and 95% confidence intervals. We provide a Monte Carlo technique, which confirms that the Yellowstone ecosystem grizzly bear population increased during the period 1986-1998.

  8. Joint Analysis of Binomial and Continuous Traits with a Recursive Model

    DEFF Research Database (Denmark)

    Varona, Louis; Sorensen, Daniel

    2014-01-01

    This work presents a model for the joint analysis of a binomial and a Gaussian trait using a recursive parametrization that leads to a computationally efficient implementation. The model is illustrated in an analysis of mortality and litter size in two breeds of Danish pigs, Landrace and Yorkshir...

  9. A binomial random sum of present value models in investment analysis

    OpenAIRE

    Βουδούρη, Αγγελική; Ντζιαχρήστος, Ευάγγελος

    1997-01-01

    Stochastic present value models have been widely adopted in financial theory and practice and play a very important role in capital budgeting and profit planning. The purpose of this paper is to introduce a binomial random sum of stochastic present value models and offer an application in investment analysis.

  10. Distinguishing between Binomial, Hypergeometric and Negative Binomial Distributions

    Science.gov (United States)

    Wroughton, Jacqueline; Cole, Tarah

    2013-01-01

    Recognizing the differences between three discrete distributions (Binomial, Hypergeometric and Negative Binomial) can be challenging for students. We present an activity designed to help students differentiate among these distributions. In addition, we present assessment results in the form of pre- and post-tests that were designed to assess the…

  11. Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

    Science.gov (United States)

    Prahutama, Alan; Sudarno

    2018-05-01

    The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).

  12. Analysis of railroad tank car releases using a generalized binomial model.

    Science.gov (United States)

    Liu, Xiang; Hong, Yili

    2015-11-01

    The United States is experiencing an unprecedented boom in shale oil production, leading to a dramatic growth in petroleum crude oil traffic by rail. In 2014, U.S. railroads carried over 500,000 tank carloads of petroleum crude oil, up from 9500 in 2008 (a 5300% increase). In light of continual growth in crude oil by rail, there is an urgent national need to manage this emerging risk. This need has been underscored in the wake of several recent crude oil release incidents. In contrast to highway transport, which usually involves a tank trailer, a crude oil train can carry a large number of tank cars, having the potential for a large, multiple-tank-car release incident. Previous studies exclusively assumed that railroad tank car releases in the same train accident are mutually independent, thereby estimating the number of tank cars releasing given the total number of tank cars derailed based on a binomial model. This paper specifically accounts for dependent tank car releases within a train accident. We estimate the number of tank cars releasing given the number of tank cars derailed based on a generalized binomial model. The generalized binomial model provides a significantly better description for the empirical tank car accident data through our numerical case study. This research aims to provide a new methodology and new insights regarding the further development of risk management strategies for improving railroad crude oil transportation safety. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  14. Data analysis using the Binomial Failure Rate common cause model

    International Nuclear Information System (INIS)

    Atwood, C.L.

    1983-09-01

    This report explains how to use the Binomial Failure Rate (BFR) method to estimate common cause failure rates. The entire method is described, beginning with the conceptual model, and covering practical issues of data preparation, treatment of variation in the failure rates, Bayesian estimation of the quantities of interest, checking the model assumptions for lack of fit to the data, and the ultimate application of the answers

  15. Sample Size Estimation for Negative Binomial Regression Comparing Rates of Recurrent Events with Unequal Follow-Up Time.

    Science.gov (United States)

    Tang, Yongqiang

    2015-01-01

    A sample size formula is derived for negative binomial regression for the analysis of recurrent events, in which subjects can have unequal follow-up time. We obtain sharp lower and upper bounds on the required size, which is easy to compute. The upper bound is generally only slightly larger than the required size, and hence can be used to approximate the sample size. The lower and upper size bounds can be decomposed into two terms. The first term relies on the mean number of events in each group, and the second term depends on two factors that measure, respectively, the extent of between-subject variability in event rates, and follow-up time. Simulation studies are conducted to assess the performance of the proposed method. An application of our formulae to a multiple sclerosis trial is provided.

  16. Stochastic background of negative binomial distribution

    International Nuclear Information System (INIS)

    Suzuki, N.; Biyajima, M.; Wilk, G.

    1991-01-01

    A branching equations of the birth process with immigration is taken as a model for the particle production process. Using it we investigate cases in which its solution becomes the negative binomial distribution. Furthermore, we compare our approach with the modified negative binomial distribution proposed recently by Chliapnikov and Tchikilev and use it to analyse the observed multiplicity distributions. (orig.)

  17. Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

    Science.gov (United States)

    Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

    2014-01-01

    The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

  18. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...... the age-dependent or antagonistic pleiotropic effects. The models are applied to HFE genotype data to assess the effects on human longevity by different alleles and to detect if an age-dependent effect exists. Application has shown that these methods can serve as useful tools in searching for important...

  19. Correlation Structures of Correlated Binomial Models and Implied Default Distribution

    Science.gov (United States)

    Mori, Shintaro; Kitsukawa, Kenji; Hisakado, Masato

    2008-11-01

    We show how to analyze and interpret the correlation structures, the conditional expectation values and correlation coefficients of exchangeable Bernoulli random variables. We study implied default distributions for the iTraxx-CJ tranches and some popular probabilistic models, including the Gaussian copula model, Beta binomial distribution model and long-range Ising model. We interpret the differences in their profiles in terms of the correlation structures. The implied default distribution has singular correlation structures, reflecting the credit market implications. We point out two possible origins of the singular behavior.

  20. Bayesian analysis of overdispersed chromosome aberration data with the negative binomial model

    International Nuclear Information System (INIS)

    Brame, R.S.; Groer, P.G.

    2002-01-01

    The usual assumption of a Poisson model for the number of chromosome aberrations in controlled calibration experiments implies variance equal to the mean. However, it is known that chromosome aberration data from experiments involving high linear energy transfer radiations can be overdispersed, i.e. the variance is greater than the mean. Present methods for dealing with overdispersed chromosome data rely on frequentist statistical techniques. In this paper, the problem of overdispersion is considered from a Bayesian standpoint. The Bayes Factor is used to compare Poisson and negative binomial models for two previously published calibration data sets describing the induction of dicentric chromosome aberrations by high doses of neutrons. Posterior densities for the model parameters, which characterise dose response and overdispersion are calculated and graphed. Calibrative densities are derived for unknown neutron doses from hypothetical radiation accident data to determine the impact of different model assumptions on dose estimates. The main conclusion is that an initial assumption of a negative binomial model is the conservative approach to chromosome dosimetry for high LET radiations. (author)

  1. Time series regression model for infectious disease and weather.

    Science.gov (United States)

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  3. The importance of distribution-choice in modeling substance use data: a comparison of negative binomial, beta binomial, and zero-inflated distributions.

    Science.gov (United States)

    Wagner, Brandie; Riggs, Paula; Mikulich-Gilbertson, Susan

    2015-01-01

    It is important to correctly understand the associations among addiction to multiple drugs and between co-occurring substance use and psychiatric disorders. Substance-specific outcomes (e.g. number of days used cannabis) have distributional characteristics which range widely depending on the substance and the sample being evaluated. We recommend a four-part strategy for determining the appropriate distribution for modeling substance use data. We demonstrate this strategy by comparing the model fit and resulting inferences from applying four different distributions to model use of substances that range greatly in the prevalence and frequency of their use. Using Timeline Followback (TLFB) data from a previously-published study, we used negative binomial, beta-binomial and their zero-inflated counterparts to model proportion of days during treatment of cannabis, cigarettes, alcohol, and opioid use. The fit for each distribution was evaluated with statistical model selection criteria, visual plots and a comparison of the resulting inferences. We demonstrate the feasibility and utility of modeling each substance individually and show that no single distribution provides the best fit for all substances. Inferences regarding use of each substance and associations with important clinical variables were not consistent across models and differed by substance. Thus, the distribution chosen for modeling substance use must be carefully selected and evaluated because it may impact the resulting conclusions. Furthermore, the common procedure of aggregating use across different substances may not be ideal.

  4. Binomial Rings: Axiomatisation, Transfer and Classification

    OpenAIRE

    Xantcha, Qimh Richey

    2011-01-01

    Hall's binomial rings, rings with binomial coefficients, are given an axiomatisation and proved identical to the numerical rings studied by Ekedahl. The Binomial Transfer Principle is established, enabling combinatorial proofs of algebraical identities. The finitely generated binomial rings are completely classified. An application to modules over binomial rings is given.

  5. Partitioning detectability components in populations subject to within-season temporary emigration using binomial mixture models.

    Science.gov (United States)

    O'Donnell, Katherine M; Thompson, Frank R; Semlitsch, Raymond D

    2015-01-01

    Detectability of individual animals is highly variable and nearly always binomial mixture models to account for multiple sources of variation in detectability. The state process of the hierarchical model describes ecological mechanisms that generate spatial and temporal patterns in abundance, while the observation model accounts for the imperfect nature of counting individuals due to temporary emigration and false absences. We illustrate our model's potential advantages, including the allowance of temporary emigration between sampling periods, with a case study of southern red-backed salamanders Plethodon serratus. We fit our model and a standard binomial mixture model to counts of terrestrial salamanders surveyed at 40 sites during 3-5 surveys each spring and fall 2010-2012. Our models generated similar parameter estimates to standard binomial mixture models. Aspect was the best predictor of salamander abundance in our case study; abundance increased as aspect became more northeasterly. Increased time-since-rainfall strongly decreased salamander surface activity (i.e. availability for sampling), while higher amounts of woody cover objects and rocks increased conditional detection probability (i.e. probability of capture, given an animal is exposed to sampling). By explicitly accounting for both components of detectability, we increased congruence between our statistical modeling and our ecological understanding of the system. We stress the importance of choosing survey locations and protocols that maximize species availability and conditional detection probability to increase population parameter estimate reliability.

  6. Analyzing hospitalization data: potential limitations of Poisson regression.

    Science.gov (United States)

    Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R

    2015-08-01

    Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  7. Negative binomial mixed models for analyzing microbiome count data.

    Science.gov (United States)

    Zhang, Xinyan; Mallick, Himel; Tang, Zaixiang; Zhang, Lei; Cui, Xiangqin; Benson, Andrew K; Yi, Nengjun

    2017-01-03

    Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM ), providing a useful tool for analyzing microbiome data.

  8. Regular exercise and related factors in patients with Parkinson's disease: Applying zero-inflated negative binomial modeling of exercise count data.

    Science.gov (United States)

    Lee, JuHee; Park, Chang Gi; Choi, Moonki

    2016-05-01

    This study was conducted to identify risk factors that influence regular exercise among patients with Parkinson's disease in Korea. Parkinson's disease is prevalent in the elderly, and may lead to a sedentary lifestyle. Exercise can enhance physical and psychological health. However, patients with Parkinson's disease are less likely to exercise than are other populations due to physical disability. A secondary data analysis and cross-sectional descriptive study were conducted. A convenience sample of 106 patients with Parkinson's disease was recruited at an outpatient neurology clinic of a tertiary hospital in Korea. Demographic characteristics, disease-related characteristics (including disease duration and motor symptoms), self-efficacy for exercise, balance, and exercise level were investigated. Negative binomial regression and zero-inflated negative binomial regression for exercise count data were utilized to determine factors involved in exercise. The mean age of participants was 65.85 ± 8.77 years, and the mean duration of Parkinson's disease was 7.23 ± 6.02 years. Most participants indicated that they engaged in regular exercise (80.19%). Approximately half of participants exercised at least 5 days per week for 30 min, as recommended (51.9%). Motor symptoms were a significant predictor of exercise in the count model, and self-efficacy for exercise was a significant predictor of exercise in the zero model. Severity of motor symptoms was related to frequency of exercise. Self-efficacy contributed to the probability of exercise. Symptom management and improvement of self-efficacy for exercise are important to encourage regular exercise in patients with Parkinson's disease. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Binomial collisions and near collisions

    OpenAIRE

    Blokhuis, Aart; Brouwer, Andries; de Weger, Benne

    2017-01-01

    We describe efficient algorithms to search for cases in which binomial coefficients are equal or almost equal, give a conjecturally complete list of all cases where two binomial coefficients differ by 1, and give some identities for binomial coefficients that seem to be new.

  10. Computational results on the compound binomial risk model with nonhomogeneous claim occurrences

    NARCIS (Netherlands)

    Tuncel, A.; Tank, F.

    2013-01-01

    The aim of this paper is to give a recursive formula for non-ruin (survival) probability when the claim occurrences are nonhomogeneous in the compound binomial risk model. We give recursive formulas for non-ruin (survival) probability and for distribution of the total number of claims under the

  11. Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

    Science.gov (United States)

    Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard

    2016-10-01

    In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.

  12. Using the {Beta}-binomial distribution to characterize forest health

    Energy Technology Data Exchange (ETDEWEB)

    Zarnoch, S. J. [USDA Forest Service, Southern Research Station, Athens, GA (United States); Anderson, R.L.; Sheffield, R. M. [USDA Forest Service, Southern Research Station, Asheville, NC (United States)

    1995-03-01

    Forest health monitoring programs often use base variables which are dichotomous (i e. alive/dead, damaged/undamaged) to describe the health of trees. Typical sampling designs usually consist of randomly or systematically chosen clusters of trees for observation.It was claimed that contagiousness of diseases for example may result in non-uniformity of affected trees, so that distribution of the proportions, rather than simply the mean proportion, becomes important. The use of the {Beta}-binomial model was suggested for such cases. Use of the {Beta}-binomial distribution model applied in forest health analyses, was described.. Data on dogwood anthracnose (caused by Discula destructiva), a disease of flowering dogwood (Cornus florida L.), was used to illustrate the utility of the model. The {Beta}-binomial model allowed the detection of different distributional patterns of dogwood anthracnose over time and space. Results led to further speculation regarding the cause of the patterns. Traditional proportion analyses like ANOVA would not have detected the trends found using the {Beta}-binomial model, until more distinct patterns had evolved at a later date. The model was said to be flexible and require no special weighting or transformations of data.Another advantage claimed was its ability to handle unequal sample sizes.

  13. Zero-truncated negative binomial - Erlang distribution

    Science.gov (United States)

    Bodhisuwan, Winai; Pudprommarat, Chookait; Bodhisuwan, Rujira; Saothayanun, Luckhana

    2017-11-01

    The zero-truncated negative binomial-Erlang distribution is introduced. It is developed from negative binomial-Erlang distribution. In this work, the probability mass function is derived and some properties are included. The parameters of the zero-truncated negative binomial-Erlang distribution are estimated by using the maximum likelihood estimation. Finally, the proposed distribution is applied to real data, the number of methamphetamine in the Bangkok, Thailand. Based on the results, it shows that the zero-truncated negative binomial-Erlang distribution provided a better fit than the zero-truncated Poisson, zero-truncated negative binomial, zero-truncated generalized negative-binomial and zero-truncated Poisson-Lindley distributions for this data.

  14. Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model

    Science.gov (United States)

    Kim, Kyung Yong; Lee, Won-Chan

    2018-01-01

    Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…

  15. Perbandingan Metode Binomial dan Metode Black-Scholes Dalam Penentuan Harga Opsi

    Directory of Open Access Journals (Sweden)

    Surya Amami Pramuditya

    2016-04-01

    Full Text Available ABSTRAKOpsi adalah kontrak antara pemegang dan penulis  (buyer (holder dan seller (writer di mana penulis (writer memberikan hak (bukan kewajiban kepada holder untuk membeli atau menjual aset dari writer pada harga tertentu (strike atau latihan harga dan pada waktu tertentu dalam waktu (tanggal kadaluwarsa atau jatuh tempo waktu. Ada beberapa cara untuk menentukan harga opsi, diantaranya adalah  Metode Black-Scholes dan Metode Binomial. Metode binomial berasal dari model pergerakan harga saham yang membagi waktu interval [0, T] menjadi n sama panjang. Sedangkan metode Black-Scholes, dimodelkan dengan pergerakan harga saham sebagai suatu proses stokastik. Semakin besar partisi waktu n pada Metode Binomial, maka nilai opsinya akan konvergen ke nilai opsi Metode Black-Scholes.Kata kunci: opsi, Binomial, Black-Scholes.ABSTRACT Option is a contract between the holder and the writer in which the writer gives the right (not the obligation to the holder to buy or sell an asset of a writer at a specified price (the strike or exercise price and at a specified time in the future (expiry date or maturity time. There are several ways to determine the price of options, including the Black-Scholes Method and Binomial Method. Binomial method come from a model of stock price movement that divide time interval [0, T] into n equally long. While the Black Scholes method, the stock price movement is modeled as a stochastic process. More larger the partition of time n in Binomial Method, the value option will converge to the value option in Black-Scholes Method.Key words: Options, Binomial, Black-Scholes

  16. Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model

    DEFF Research Database (Denmark)

    Vilsen, Søren B.; Tvedebrink, Torben; Mogensen, Helle Smidt

    2015-01-01

    We present a model fitting the distribution of non-systematic errors in STR second generation sequencing, SGS, analysis. The model fits the distribution of non-systematic errors, i.e. the noise, using a one-inflated, zero-truncated, negative binomial model. The model is a two component model...

  17. Distribution-free Inference of Zero-inated Binomial Data for Longitudinal Studies.

    Science.gov (United States)

    He, H; Wang, W J; Hu, J; Gallop, R; Crits-Christoph, P; Xia, Y L

    2015-10-01

    Count reponses with structural zeros are very common in medical and psychosocial research, especially in alcohol and HIV research, and the zero-inflated poisson (ZIP) and zero-inflated negative binomial (ZINB) models are widely used for modeling such outcomes. However, as alcohol drinking outcomes such as days of drinkings are counts within a given period, their distributions are bounded above by an upper limit (total days in the period) and thus inherently follow a binomial or zero-inflated binomial (ZIB) distribution, rather than a Poisson or zero-inflated Poisson (ZIP) distribution, in the presence of structural zeros. In this paper, we develop a new semiparametric approach for modeling zero-inflated binomial (ZIB)-like count responses for cross-sectional as well as longitudinal data. We illustrate this approach with both simulated and real study data.

  18. Sample size calculation for comparing two negative binomial rates.

    Science.gov (United States)

    Zhu, Haiyuan; Lakkis, Hassan

    2014-02-10

    Negative binomial model has been increasingly used to model the count data in recent clinical trials. It is frequently chosen over Poisson model in cases of overdispersed count data that are commonly seen in clinical trials. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation. In practice, simulation methods have been frequently used for sample size estimation. In this paper, an explicit formula is developed to calculate sample size based on the negative binomial model. Depending on different approaches to estimate the variance under null hypothesis, three variations of the sample size formula are proposed and discussed. Important characteristics of the formula include its accuracy and its ability to explicitly incorporate dispersion parameter and exposure time. The performance of the formula with each variation is assessed using simulations. Copyright © 2013 John Wiley & Sons, Ltd.

  19. Number-Phase Wigner Representation and Entropic Uncertainty Relations for Binomial and Negative Binomial States

    International Nuclear Information System (INIS)

    Amitabh, J.; Vaccaro, J.A.; Hill, K.E.

    1998-01-01

    We study the recently defined number-phase Wigner function S NP (n,θ) for a single-mode field considered to be in binomial and negative binomial states. These states interpolate between Fock and coherent states and coherent and quasi thermal states, respectively, and thus provide a set of states with properties ranging from uncertain phase and sharp photon number to sharp phase and uncertain photon number. The distribution function S NP (n,θ) gives a graphical representation of the complimentary nature of the number and phase properties of these states. We highlight important differences between Wigner's quasi probability function, which is associated with the position and momentum observables, and S NP (n,θ), which is associated directly with the photon number and phase observables. We also discuss the number-phase entropic uncertainty relation for the binomial and negative binomial states and we show that negative binomial states give a lower phase entropy than states which minimize the phase variance

  20. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    Science.gov (United States)

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  1. Logistic regression for dichotomized counts.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  2. Accounting for Zero Inflation of Mussel Parasite Counts Using Discrete Regression Models

    Directory of Open Access Journals (Sweden)

    Emel Çankaya

    2017-06-01

    Full Text Available In many ecological applications, the absences of species are inevitable due to either detection faults in samples or uninhabitable conditions for their existence, resulting in high number of zero counts or abundance. Usual practice for modelling such data is regression modelling of log(abundance+1 and it is well know that resulting model is inadequate for prediction purposes. New discrete models accounting for zero abundances, namely zero-inflated regression (ZIP and ZINB, Hurdle-Poisson (HP and Hurdle-Negative Binomial (HNB amongst others are widely preferred to the classical regression models. Due to the fact that mussels are one of the economically most important aquatic products of Turkey, the purpose of this study is therefore to examine the performances of these four models in determination of the significant biotic and abiotic factors on the occurrences of Nematopsis legeri parasite harming the existence of Mediterranean mussels (Mytilus galloprovincialis L.. The data collected from the three coastal regions of Sinop city in Turkey showed more than 50% of parasite counts on the average are zero-valued and model comparisons were based on information criterion. The results showed that the probability of the occurrence of this parasite is here best formulated by ZINB or HNB models and influential factors of models were found to be correspondent with ecological differences of the regions.

  3. Expansion around half-integer values, binomial sums, and inverse binomial sums

    International Nuclear Information System (INIS)

    Weinzierl, Stefan

    2004-01-01

    I consider the expansion of transcendental functions in a small parameter around rational numbers. This includes in particular the expansion around half-integer values. I present algorithms which are suitable for an implementation within a symbolic computer algebra system. The method is an extension of the technique of nested sums. The algorithms allow in addition the evaluation of binomial sums, inverse binomial sums and generalizations thereof

  4. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  5. NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

    Science.gov (United States)

    Dong, Kai; Zhao, Hongyu; Tong, Tiejun; Wan, Xiang

    2016-09-13

    RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data

  6. Chromosome aberration analysis based on a beta-binomial distribution

    International Nuclear Information System (INIS)

    Otake, Masanori; Prentice, R.L.

    1983-10-01

    Analyses carried out here generalized on earlier studies of chromosomal aberrations in the populations of Hiroshima and Nagasaki, by allowing extra-binomial variation in aberrant cell counts corresponding to within-subject correlations in cell aberrations. Strong within-subject correlations were detected with corresponding standard errors for the average number of aberrant cells that were often substantially larger than was previously assumed. The extra-binomial variation is accomodated in the analysis in the present report, as described in the section on dose-response models, by using a beta-binomial (B-B) variance structure. It is emphasized that we have generally satisfactory agreement between the observed and the B-B fitted frequencies by city-dose category. The chromosomal aberration data considered here are not extensive enough to allow a precise discrimination between competing dose-response models. A quadratic gamma ray and linear neutron model, however, most closely fits the chromosome data. (author)

  7. Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

    Science.gov (United States)

    Stamm, John W.; Long, D. Leann; Kincade, Megan E.

    2012-01-01

    Over the past five to ten years, zero-inflated count regression models have been increasingly applied to the analysis of dental caries indices (e.g., DMFT, dfms, etc). The main reason for that is linked to the broad decline in children’s caries experience, such that dmf and DMF indices more frequently generate low or even zero counts. This article specifically reviews the application of zero-inflated Poisson and zero-inflated negative binomial regression models to dental caries, with emphasis on the description of the models and the interpretation of fitted model results given the study goals. The review finds that interpretations provided in the published caries research are often imprecise or inadvertently misleading, particularly with respect to failing to discriminate between inference for the class of susceptible persons defined by such models and inference for the sampled population in terms of overall exposure effects. Recommendations are provided to enhance the use as well as the interpretation and reporting of results of count regression models when applied to epidemiological studies of dental caries. PMID:22710271

  8. Meta-analysis for diagnostic accuracy studies: a new statistical model using beta-binomial distributions and bivariate copulas.

    Science.gov (United States)

    Kuss, Oliver; Hoyer, Annika; Solms, Alexander

    2014-01-15

    There are still challenges when meta-analyzing data from studies on diagnostic accuracy. This is mainly due to the bivariate nature of the response where information on sensitivity and specificity must be summarized while accounting for their correlation within a single trial. In this paper, we propose a new statistical model for the meta-analysis for diagnostic accuracy studies. This model uses beta-binomial distributions for the marginal numbers of true positives and true negatives and links these margins by a bivariate copula distribution. The new model comes with all the features of the current standard model, a bivariate logistic regression model with random effects, but has the additional advantages of a closed likelihood function and a larger flexibility for the correlation structure of sensitivity and specificity. In a simulation study, which compares three copula models and two implementations of the standard model, the Plackett and the Gauss copula do rarely perform worse but frequently better than the standard model. We use an example from a meta-analysis to judge the diagnostic accuracy of telomerase (a urinary tumor marker) for the diagnosis of primary bladder cancer for illustration. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Partitioning detectability components in populations subject to within-season temporary emigration using binomial mixture models.

    Directory of Open Access Journals (Sweden)

    Katherine M O'Donnell

    Full Text Available Detectability of individual animals is highly variable and nearly always < 1; imperfect detection must be accounted for to reliably estimate population sizes and trends. Hierarchical models can simultaneously estimate abundance and effective detection probability, but there are several different mechanisms that cause variation in detectability. Neglecting temporary emigration can lead to biased population estimates because availability and conditional detection probability are confounded. In this study, we extend previous hierarchical binomial mixture models to account for multiple sources of variation in detectability. The state process of the hierarchical model describes ecological mechanisms that generate spatial and temporal patterns in abundance, while the observation model accounts for the imperfect nature of counting individuals due to temporary emigration and false absences. We illustrate our model's potential advantages, including the allowance of temporary emigration between sampling periods, with a case study of southern red-backed salamanders Plethodon serratus. We fit our model and a standard binomial mixture model to counts of terrestrial salamanders surveyed at 40 sites during 3-5 surveys each spring and fall 2010-2012. Our models generated similar parameter estimates to standard binomial mixture models. Aspect was the best predictor of salamander abundance in our case study; abundance increased as aspect became more northeasterly. Increased time-since-rainfall strongly decreased salamander surface activity (i.e. availability for sampling, while higher amounts of woody cover objects and rocks increased conditional detection probability (i.e. probability of capture, given an animal is exposed to sampling. By explicitly accounting for both components of detectability, we increased congruence between our statistical modeling and our ecological understanding of the system. We stress the importance of choosing survey locations and

  10. Detection of progression of glaucomatous visual field damage using the point-wise method with the binomial test.

    Science.gov (United States)

    Karakawa, Ayako; Murata, Hiroshi; Hirasawa, Hiroyo; Mayama, Chihiro; Asaoka, Ryo

    2013-01-01

    To compare the performance of newly proposed point-wise linear regression (PLR) with the binomial test (binomial PLR) against mean deviation (MD) trend analysis and permutation analyses of PLR (PoPLR), in detecting global visual field (VF) progression in glaucoma. 15 VFs (Humphrey Field Analyzer, SITA standard, 24-2) were collected from 96 eyes of 59 open angle glaucoma patients (6.0 ± 1.5 [mean ± standard deviation] years). Using the total deviation of each point on the 2(nd) to 16(th) VFs (VF2-16), linear regression analysis was carried out. The numbers of VF test points with a significant trend at various probability levels (pbinomial test (one-side). A VF series was defined as "significant" if the median p-value from the binomial test was binomial PLR method (0.14 to 0.86) was significantly higher than MD trend analysis (0.04 to 0.89) and PoPLR (0.09 to 0.93). The PIS of the proposed method (0.0 to 0.17) was significantly lower than the MD approach (0.0 to 0.67) and PoPLR (0.07 to 0.33). The PBNS of the three approaches were not significantly different. The binomial BLR method gives more consistent results than MD trend analysis and PoPLR, hence it will be helpful as a tool to 'flag' possible VF deterioration.

  11. Study on Emission Measurement of Vehicle on Road Based on Binomial Logit Model

    OpenAIRE

    Aly, Sumarni Hamid; Selintung, Mary; Ramli, Muhammad Isran; Sumi, Tomonori

    2011-01-01

    This research attempts to evaluate emission measurement of on road vehicle. In this regard, the research develops failure probability model of vehicle emission test for passenger car which utilize binomial logit model. The model focuses on failure of CO and HC emission test for gasoline cars category and Opacity emission test for diesel-fuel cars category as dependent variables, while vehicle age, engine size, brand and type of the cars as independent variables. In order to imp...

  12. The negative binomial distribution as a model for external corrosion defect counts in buried pipelines

    International Nuclear Information System (INIS)

    Valor, Alma; Alfonso, Lester; Caleyo, Francisco; Vidal, Julio; Perez-Baruch, Eloy; Hallen, José M.

    2015-01-01

    Highlights: • Observed external-corrosion defects in underground pipelines revealed a tendency to cluster. • The Poisson distribution is unable to fit extensive count data for these type of defects. • In contrast, the negative binomial distribution provides a suitable count model for them. • Two spatial stochastic processes lead to the negative binomial distribution for defect counts. • They are the Gamma-Poisson mixed process and the compound Poisson process. • A Rogeŕs process also arises as a plausible temporal stochastic process leading to corrosion defect clustering and to negative binomially distributed defect counts. - Abstract: The spatial distribution of external corrosion defects in buried pipelines is usually described as a Poisson process, which leads to corrosion defects being randomly distributed along the pipeline. However, in real operating conditions, the spatial distribution of defects considerably departs from Poisson statistics due to the aggregation of defects in groups or clusters. In this work, the statistical analysis of real corrosion data from underground pipelines operating in southern Mexico leads to conclude that the negative binomial distribution provides a better description for defect counts. The origin of this distribution from several processes is discussed. The analysed processes are: mixed Gamma-Poisson, compound Poisson and Roger’s processes. The physical reasons behind them are discussed for the specific case of soil corrosion.

  13. Low reheating temperatures in monomial and binomial inflationary models

    International Nuclear Information System (INIS)

    Rehagen, Thomas; Gelmini, Graciela B.

    2015-01-01

    We investigate the allowed range of reheating temperature values in light of the Planck 2015 results and the recent joint analysis of Cosmic Microwave Background (CMB) data from the BICEP2/Keck Array and Planck experiments, using monomial and binomial inflationary potentials. While the well studied ϕ 2 inflationary potential is no longer favored by current CMB data, as well as ϕ p with p>2, a ϕ 1 potential and canonical reheating (w re =0) provide a good fit to the CMB measurements. In this last case, we find that the Planck 2015 68% confidence limit upper bound on the spectral index, n s , implies an upper bound on the reheating temperature of T re ≲6×10 10 GeV, and excludes instantaneous reheating. The low reheating temperatures allowed by this model open the possibility that dark matter could be produced during the reheating period instead of when the Universe is radiation dominated, which could lead to very different predictions for the relic density and momentum distribution of WIMPs, sterile neutrinos, and axions. We also study binomial inflationary potentials and show the effects of a small departure from a ϕ 1 potential. We find that as a subdominant ϕ 2 term in the potential increases, first instantaneous reheating becomes allowed, and then the lowest possible reheating temperature of T re =4 MeV is excluded by the Planck 2015 68% confidence limit

  14. The option to expand a project: its assessment with the binomial options pricing model

    Directory of Open Access Journals (Sweden)

    Salvador Cruz Rambaud

    Full Text Available Traditional methods of investment appraisal, like the Net Present Value, are not able to include the value of the operational flexibility of the project. In this paper, real options, and more specifically the option to expand, are assumed to be included in the project information in addition to the expected cash flows. Thus, to calculate the total value of the project, we are going to apply the methodology of the Net Present Value to the different scenarios derived from the existence of the real option to expand. Taking into account the analogy between real and financial options, the value of including an option to expand is explored by using the binomial options pricing model. In this way, estimating the value of the option to expand is a tool which facilitates the control of the uncertainty element implicit in the project. Keywords: Real options, Option to expand, Binomial options pricing model, Investment project appraisal

  15. Disease Mapping and Regression with Count Data in the Presence of Overdispersion and Spatial Autocorrelation: A Bayesian Model Averaging Approach

    Science.gov (United States)

    Mohebbi, Mohammadreza; Wolfe, Rory; Forbes, Andrew

    2014-01-01

    This paper applies the generalised linear model for modelling geographical variation to esophageal cancer incidence data in the Caspian region of Iran. The data have a complex and hierarchical structure that makes them suitable for hierarchical analysis using Bayesian techniques, but with care required to deal with problems arising from counts of events observed in small geographical areas when overdispersion and residual spatial autocorrelation are present. These considerations lead to nine regression models derived from using three probability distributions for count data: Poisson, generalised Poisson and negative binomial, and three different autocorrelation structures. We employ the framework of Bayesian variable selection and a Gibbs sampling based technique to identify significant cancer risk factors. The framework deals with situations where the number of possible models based on different combinations of candidate explanatory variables is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. The evidence from applying the modelling methodology suggests that modelling strategies based on the use of generalised Poisson and negative binomial with spatial autocorrelation work well and provide a robust basis for inference. PMID:24413702

  16. Reliability of environmental sampling culture results using the negative binomial intraclass correlation coefficient.

    Science.gov (United States)

    Aly, Sharif S; Zhao, Jianyang; Li, Ben; Jiang, Jiming

    2014-01-01

    The Intraclass Correlation Coefficient (ICC) is commonly used to estimate the similarity between quantitative measures obtained from different sources. Overdispersed data is traditionally transformed so that linear mixed model (LMM) based ICC can be estimated. A common transformation used is the natural logarithm. The reliability of environmental sampling of fecal slurry on freestall pens has been estimated for Mycobacterium avium subsp. paratuberculosis using the natural logarithm transformed culture results. Recently, the negative binomial ICC was defined based on a generalized linear mixed model for negative binomial distributed data. The current study reports on the negative binomial ICC estimate which includes fixed effects using culture results of environmental samples. Simulations using a wide variety of inputs and negative binomial distribution parameters (r; p) showed better performance of the new negative binomial ICC compared to the ICC based on LMM even when negative binomial data was logarithm, and square root transformed. A second comparison that targeted a wider range of ICC values showed that the mean of estimated ICC closely approximated the true ICC.

  17. Beta-Binomial Model for the Detection of Rare Mutations in Pooled Next-Generation Sequencing Experiments.

    Science.gov (United States)

    Jakaitiene, Audrone; Avino, Mariano; Guarracino, Mario Rosario

    2017-04-01

    Against diminishing costs, next-generation sequencing (NGS) still remains expensive for studies with a large number of individuals. As cost saving, sequencing genome of pools containing multiple samples might be used. Currently, there are many software available for the detection of single-nucleotide polymorphisms (SNPs). Sensitivity and specificity depend on the model used and data analyzed, indicating that all software have space for improvement. We use beta-binomial model to detect rare mutations in untagged pooled NGS experiments. We propose a multireference framework for pooled data with ability being specific up to two patients affected by neuromuscular disorders (NMD). We assessed the results comparing with The Genome Analysis Toolkit (GATK), CRISP, SNVer, and FreeBayes. Our results show that the multireference approach applying beta-binomial model is accurate in predicting rare mutations at 0.01 fraction. Finally, we explored the concordance of mutations between the model and software, checking their involvement in any NMD-related gene. We detected seven novel SNPs, for which the functional analysis produced enriched terms related to locomotion and musculature.

  18. On pricing futures options on random binomial tree

    International Nuclear Information System (INIS)

    Bayram, Kamola; Ganikhodjaev, Nasir

    2013-01-01

    The discrete-time approach to real option valuation has typically been implemented in the finance literature using a binomial tree framework. Instead we develop a new model by randomizing the environment and call such model a random binomial tree. Whereas the usual model has only one environment (u, d) where the price of underlying asset can move by u times up and d times down, and pair (u, d) is constant over the life of the underlying asset, in our new model the underlying security is moving in two environments namely (u 1 , d 1 ) and (u 2 , d 2 ). Thus we obtain two volatilities σ 1 and σ 2 . This new approach enables calculations reflecting the real market since it consider the two states of market normal and extra ordinal. In this paper we define and study Futures options for such models.

  19. Una prueba de razón de verosimilitudes para discriminar entre la distribución Poisson, Binomial y Binomial Negativa.

    OpenAIRE

    López Martínez, Laura Elizabeth

    2010-01-01

    En este trabajo se realiza inferencia estadística en la distribución Binomial Negativa Generalizada (BNG) y los modelos que anida, los cuales son Binomial, Binomial Negativa y Poisson. Se aborda el problema de estimación de parámetros en la distribución BNG y se propone una prueba de razón de verosimilitud generalizada para discernir si un conjunto de datos se ajusta en particular al modelo Binomial, Binomial Negativa o Poisson. Además, se estudian las potencias y tamaños de la prueba p...

  20. Modelling binary data

    CERN Document Server

    Collett, David

    2002-01-01

    INTRODUCTION Some Examples The Scope of this Book Use of Statistical Software STATISTICAL INFERENCE FOR BINARY DATA The Binomial Distribution Inference about the Success Probability Comparison of Two Proportions Comparison of Two or More Proportions MODELS FOR BINARY AND BINOMIAL DATA Statistical Modelling Linear Models Methods of Estimation Fitting Linear Models to Binomial Data Models for Binomial Response Data The Linear Logistic Model Fitting the Linear Logistic Model to Binomial Data Goodness of Fit of a Linear Logistic Model Comparing Linear Logistic Models Linear Trend in Proportions Comparing Stimulus-Response Relationships Non-Convergence and Overfitting Some other Goodness of Fit Statistics Strategy for Model Selection Predicting a Binary Response Probability BIOASSAY AND SOME OTHER APPLICATIONS The Tolerance Distribution Estimating an Effective Dose Relative Potency Natural Response Non-Linear Logistic Regression Models Applications of the Complementary Log-Log Model MODEL CHECKING Definition of Re...

  1. SEPARATION PHENOMENA LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Ikaro Daniel de Carvalho Barreto

    2014-03-01

    Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.

  2. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  3. Predicting expressway crash frequency using a random effect negative binomial model: A case study in China.

    Science.gov (United States)

    Ma, Zhuanglin; Zhang, Honglu; Chien, Steven I-Jy; Wang, Jin; Dong, Chunjiao

    2017-01-01

    To investigate the relationship between crash frequency and potential influence factors, the accident data for events occurring on a 50km long expressway in China, including 567 crash records (2006-2008), were collected and analyzed. Both the fixed-length and the homogeneous longitudinal grade methods were applied to divide the study expressway section into segments. A negative binomial (NB) model and a random effect negative binomial (RENB) model were developed to predict crash frequency. The parameters of both models were determined using the maximum likelihood (ML) method, and the mixed stepwise procedure was applied to examine the significance of explanatory variables. Three explanatory variables, including longitudinal grade, road width, and ratio of longitudinal grade and curve radius (RGR), were found as significantly affecting crash frequency. The marginal effects of significant explanatory variables to the crash frequency were analyzed. The model performance was determined by the relative prediction error and the cumulative standardized residual. The results show that the RENB model outperforms the NB model. It was also found that the model performance with the fixed-length segment method is superior to that with the homogeneous longitudinal grade segment method. Copyright © 2016. Published by Elsevier Ltd.

  4. System-Reliability Cumulative-Binomial Program

    Science.gov (United States)

    Scheuer, Ernest M.; Bowerman, Paul N.

    1989-01-01

    Cumulative-binomial computer program, NEWTONP, one of set of three programs, calculates cumulative binomial probability distributions for arbitrary inputs. NEWTONP, CUMBIN (NPO-17555), and CROSSER (NPO-17557), used independently of one another. Program finds probability required to yield given system reliability. Used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. Program written in C.

  5. Multilevel binomial logistic prediction model for malignant pulmonary nodules based on texture features of CT image

    International Nuclear Information System (INIS)

    Wang Huan; Guo Xiuhua; Jia Zhongwei; Li Hongkai; Liang Zhigang; Li Kuncheng; He Qian

    2010-01-01

    Purpose: To introduce multilevel binomial logistic prediction model-based computer-aided diagnostic (CAD) method of small solitary pulmonary nodules (SPNs) diagnosis by combining patient and image characteristics by textural features of CT image. Materials and methods: Describe fourteen gray level co-occurrence matrix textural features obtained from 2171 benign and malignant small solitary pulmonary nodules, which belongs to 185 patients. Multilevel binomial logistic model is applied to gain these initial insights. Results: Five texture features, including Inertia, Entropy, Correlation, Difference-mean, Sum-Entropy, and age of patients own aggregating character on patient-level, which are statistically different (P < 0.05) between benign and malignant small solitary pulmonary nodules. Conclusion: Some gray level co-occurrence matrix textural features are efficiently descriptive features of CT image of small solitary pulmonary nodules, which can profit diagnosis of earlier period lung cancer if combined patient-level characteristics to some extent.

  6. Extra-binomial variation approach for analysis of pooled DNA sequencing data

    Science.gov (United States)

    Wallace, Chris

    2012-01-01

    Motivation: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. Results: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. Availability: Package ‘extraBinomial’ is on http://cran.r-project.org/ Contact: chris.wallace@cimr.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:22976083

  7. Common-Reliability Cumulative-Binomial Program

    Science.gov (United States)

    Scheuer, Ernest, M.; Bowerman, Paul N.

    1989-01-01

    Cumulative-binomial computer program, CROSSER, one of set of three programs, calculates cumulative binomial probability distributions for arbitrary inputs. CROSSER, CUMBIN (NPO-17555), and NEWTONP (NPO-17556), used independently of one another. Point of equality between reliability of system and common reliability of components found. Used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. Program written in C.

  8. A Mixed-Effects Heterogeneous Negative Binomial Model for Postfire Conifer Regeneration in Northeastern California, USA

    Science.gov (United States)

    Justin S. Crotteau; Martin W. Ritchie; J. Morgan. Varner

    2014-01-01

    Many western USA fire regimes are typified by mixed-severity fire, which compounds the variability inherent to natural regeneration densities in associated forests. Tree regeneration data are often discrete and nonnegative; accordingly, we fit a series of Poisson and negative binomial variation models to conifer seedling counts across four distinct burn severities and...

  9. Negative Binomial Distribution and the multiplicity moments at the LHC

    International Nuclear Information System (INIS)

    Praszalowicz, Michal

    2011-01-01

    In this work we show that the latest LHC data on multiplicity moments C 2 -C 5 are well described by a two-step model in the form of a convolution of the Poisson distribution with energy-dependent source function. For the source function we take Γ Negative Binomial Distribution. No unexpected behavior of Negative Binomial Distribution parameter k is found. We give also predictions for the higher energies of 10 and 14 TeV.

  10. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    Science.gov (United States)

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  11. Calculating Cumulative Binomial-Distribution Probabilities

    Science.gov (United States)

    Scheuer, Ernest M.; Bowerman, Paul N.

    1989-01-01

    Cumulative-binomial computer program, CUMBIN, one of set of three programs, calculates cumulative binomial probability distributions for arbitrary inputs. CUMBIN, NEWTONP (NPO-17556), and CROSSER (NPO-17557), used independently of one another. Reliabilities and availabilities of k-out-of-n systems analyzed. Used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. Used for calculations of reliability and availability. Program written in C.

  12. Modeling random telegraph signal noise in CMOS image sensor under low light based on binomial distribution

    International Nuclear Information System (INIS)

    Zhang Yu; Wang Guangyi; Lu Xinmiao; Hu Yongcai; Xu Jiangtao

    2016-01-01

    The random telegraph signal noise in the pixel source follower MOSFET is the principle component of the noise in the CMOS image sensor under low light. In this paper, the physical and statistical model of the random telegraph signal noise in the pixel source follower based on the binomial distribution is set up. The number of electrons captured or released by the oxide traps in the unit time is described as the random variables which obey the binomial distribution. As a result, the output states and the corresponding probabilities of the first and the second samples of the correlated double sampling circuit are acquired. The standard deviation of the output states after the correlated double sampling circuit can be obtained accordingly. In the simulation section, one hundred thousand samples of the source follower MOSFET have been simulated, and the simulation results show that the proposed model has the similar statistical characteristics with the existing models under the effect of the channel length and the density of the oxide trap. Moreover, the noise histogram of the proposed model has been evaluated at different environmental temperatures. (paper)

  13. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  14. Estimating negative binomial parameters from occurrence data with detection times.

    Science.gov (United States)

    Hwang, Wen-Han; Huggins, Richard; Stoklosa, Jakub

    2016-11-01

    The negative binomial distribution is a common model for the analysis of count data in biology and ecology. In many applications, we may not observe the complete frequency count in a quadrat but only that a species occurred in the quadrat. If only occurrence data are available then the two parameters of the negative binomial distribution, the aggregation index and the mean, are not identifiable. This can be overcome by data augmentation or through modeling the dependence between quadrat occupancies. Here, we propose to record the (first) detection time while collecting occurrence data in a quadrat. We show that under what we call proportionate sampling, where the time to survey a region is proportional to the area of the region, that both negative binomial parameters are estimable. When the mean parameter is larger than two, our proposed approach is more efficient than the data augmentation method developed by Solow and Smith (, Am. Nat. 176, 96-98), and in general is cheaper to conduct. We also investigate the effect of misidentification when collecting negative binomially distributed data, and conclude that, in general, the effect can be simply adjusted for provided that the mean and variance of misidentification probabilities are known. The results are demonstrated in a simulation study and illustrated in several real examples. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. [The reentrant binomial model of nuclear anomalies growth in rhabdomyosarcoma RA-23 cell populations under increasing doze of rare ionizing radiation].

    Science.gov (United States)

    Alekseeva, N P; Alekseev, A O; Vakhtin, Iu B; Kravtsov, V Iu; Kuzovatov, S N; Skorikova, T I

    2008-01-01

    Distributions of nuclear morphology anomalies in transplantable rabdomiosarcoma RA-23 cell populations were investigated under effect of ionizing radiation from 0 to 45 Gy. Internuclear bridges, nuclear protrusions and dumbbell-shaped nuclei were accepted for morphological anomalies. Empirical distributions of the number of anomalies per 100 nuclei were used. The adequate model of reentrant binomial distribution has been found. The sum of binomial random variables with binomial number of summands has such distribution. Averages of these random variables were named, accordingly, internal and external average reentrant components. Their maximum likelihood estimations were received. Statistical properties of these estimations were investigated by means of statistical modeling. It has been received that at equally significant correlation between the radiation dose and the average of nuclear anomalies in cell populations after two-three cellular cycles from the moment of irradiation in vivo the irradiation doze significantly correlates with internal average reentrant component, and in remote descendants of cell transplants irradiated in vitro - with external one.

  16. Problems on Divisibility of Binomial Coefficients

    Science.gov (United States)

    Osler, Thomas J.; Smoak, James

    2004-01-01

    Twelve unusual problems involving divisibility of the binomial coefficients are represented in this article. The problems are listed in "The Problems" section. All twelve problems have short solutions which are listed in "The Solutions" section. These problems could be assigned to students in any course in which the binomial theorem and Pascal's…

  17. Squeezing and other non-classical features in k-photon anharmonic oscillator in binomial and negative binomial states of the field

    International Nuclear Information System (INIS)

    Joshi, A.; Lawande, S.V.

    1990-01-01

    A systematic study of squeezing obtained from k-photon anharmonic oscillator (with interaction hamiltonian of the form (a † ) k , k ≥ 2) interacting with light whose statistics can be varied from sub-Poissonian to poissonian via binomial state of field and super-Poissonian to poissonian via negative binomial state of field is presented. The authors predict that for all values of k there is a tendency increase in squeezing with increased sub-Poissonian character of the field while the reverse is true with super-Poissonian field. They also present non-classical behavior of the first order coherence function explicitly for k = 2 case (i.e., for two-photon anharmonic oscillator model used for a Kerr-like medium) with variation in the statistics of the input light

  18. Properties of parameter estimation techniques for a beta-binomial failure model. Final technical report

    International Nuclear Information System (INIS)

    Shultis, J.K.; Buranapan, W.; Eckhoff, N.D.

    1981-12-01

    Of considerable importance in the safety analysis of nuclear power plants are methods to estimate the probability of failure-on-demand, p, of a plant component that normally is inactive and that may fail when activated or stressed. Properties of five methods for estimating from failure-on-demand data the parameters of the beta prior distribution in a compound beta-binomial probability model are examined. Simulated failure data generated from a known beta-binomial marginal distribution are used to estimate values of the beta parameters by (1) matching moments of the prior distribution to those of the data, (2) the maximum likelihood method based on the prior distribution, (3) a weighted marginal matching moments method, (4) an unweighted marginal matching moments method, and (5) the maximum likelihood method based on the marginal distribution. For small sample sizes (N = or < 10) with data typical of low failure probability components, it was found that the simple prior matching moments method is often superior (e.g. smallest bias and mean squared error) while for larger sample sizes the marginal maximum likelihood estimators appear to be best

  19. Multifractal structure of multiplicity distributions and negative binomials

    International Nuclear Information System (INIS)

    Malik, S.; Delhi, Univ.

    1997-01-01

    The paper presents experimental results of the multifractal structure analysis in proton-emulsion interactions at 800 GeV. The multiplicity moments have a power law dependence on the mean multiplicity in varying bin sizes of pseudorapidity. The values of generalised dimensions are calculated from the slope value. The multifractal characteristics are also examined in the light of negative binomials. The observed multiplicity moments and those derived from the negative-binomial fits agree well with each other. Also the values of D q , both observed and derived from the negative-binomial fits not only decrease with q typifying multifractality but also agree well each other showing consistency with the negative-binomial form

  20. Use of the Beta-Binomial Model for Central Statistical Monitoring of Multicenter Clinical Trials

    OpenAIRE

    Desmet, Lieven; Venet, David; Doffagne, Erik; Timmermans, Catherine; Legrand, Catherine; Burzykowski, Tomasz; Buyse, Marc

    2017-01-01

    As part of central statistical monitoring of multicenter clinical trial data, we propose a procedure based on the beta-binomial distribution for the detection of centers with atypical values for the probability of some event. The procedure makes no assumptions about the typical event proportion and uses the event counts from all centers to derive a reference model. The procedure is shown through simulations to have high sensitivity and high specificity if the contamination rate is small and t...

  1. Forecasting asthma-related hospital admissions in London using negative binomial models.

    Science.gov (United States)

    Soyiri, Ireneous N; Reidpath, Daniel D; Sarran, Christophe

    2013-05-01

    Health forecasting can improve health service provision and individual patient outcomes. Environmental factors are known to impact chronic respiratory conditions such as asthma, but little is known about the extent to which these factors can be used for forecasting. Using weather, air quality and hospital asthma admissions, in London (2005-2006), two related negative binomial models were developed and compared with a naive seasonal model. In the first approach, predictive forecasting models were fitted with 7-day averages of each potential predictor, and then a subsequent multivariable model is constructed. In the second strategy, an exhaustive search of the best fitting models between possible combinations of lags (0-14 days) of all the environmental effects on asthma admission was conducted. Three models were considered: a base model (seasonal effects), contrasted with a 7-day average model and a selected lags model (weather and air quality effects). Season is the best predictor of asthma admissions. The 7-day average and seasonal models were trivial to implement. The selected lags model was computationally intensive, but of no real value over much more easily implemented models. Seasonal factors can predict daily hospital asthma admissions in London, and there is a little evidence that additional weather and air quality information would add to forecast accuracy.

  2. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  3. Performance of models for estimating absolute risk difference in multicenter trials with binary outcome

    Directory of Open Access Journals (Sweden)

    Claudia Pedroza

    2016-08-01

    Full Text Available Abstract Background Reporting of absolute risk difference (RD is recommended for clinical and epidemiological prospective studies. In analyses of multicenter studies, adjustment for center is necessary when randomization is stratified by center or when there is large variation in patients outcomes across centers. While regression methods are used to estimate RD adjusted for baseline predictors and clustering, no formal evaluation of their performance has been previously conducted. Methods We performed a simulation study to evaluate 6 regression methods fitted under a generalized estimating equation framework: binomial identity, Poisson identity, Normal identity, log binomial, log Poisson, and logistic regression model. We compared the model estimates to unadjusted estimates. We varied the true response function (identity or log, number of subjects per center, true risk difference, control outcome rate, effect of baseline predictor, and intracenter correlation. We compared the models in terms of convergence, absolute bias and coverage of 95 % confidence intervals for RD. Results The 6 models performed very similar to each other for the majority of scenarios. However, the log binomial model did not converge for a large portion of the scenarios including a baseline predictor. In scenarios with outcome rate close to the parameter boundary, the binomial and Poisson identity models had the best performance, but differences from other models were negligible. The unadjusted method introduced little bias to the RD estimates, but its coverage was larger than the nominal value in some scenarios with an identity response. Under the log response, coverage from the unadjusted method was well below the nominal value (<80 % for some scenarios. Conclusions We recommend the use of a binomial or Poisson GEE model with identity link to estimate RD for correlated binary outcome data. If these models fail to run, then either a logistic regression, log Poisson

  4. Abstract knowledge versus direct experience in processing of binomial expressions.

    Science.gov (United States)

    Morgan, Emily; Levy, Roger

    2016-12-01

    We ask whether word order preferences for binomial expressions of the form A and B (e.g. bread and butter) are driven by abstract linguistic knowledge of ordering constraints referencing the semantic, phonological, and lexical properties of the constituent words, or by prior direct experience with the specific items in questions. Using forced-choice and self-paced reading tasks, we demonstrate that online processing of never-before-seen binomials is influenced by abstract knowledge of ordering constraints, which we estimate with a probabilistic model. In contrast, online processing of highly frequent binomials is primarily driven by direct experience, which we estimate from corpus frequency counts. We propose a trade-off wherein processing of novel expressions relies upon abstract knowledge, while reliance upon direct experience increases with increased exposure to an expression. Our findings support theories of language processing in which both compositional generation and direct, holistic reuse of multi-word expressions play crucial roles. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Zero inflated negative binomial-Sushila distribution and its application

    Science.gov (United States)

    Yamrubboon, Darika; Thongteeraparp, Ampai; Bodhisuwan, Winai; Jampachaisri, Katechan

    2017-11-01

    A new zero inflated distribution is proposed in this work, namely the zero inflated negative binomial-Sushila distribution. The new distribution which is a mixture of the Bernoulli and negative binomial-Sushila distributions is an alternative distribution for the excessive zero counts and over-dispersion. Some characteristics of the proposed distribution are derived including probability mass function, mean and variance. The parameter estimation of the zero inflated negative binomial-Sushila distribution is also implemented by maximum likelihood method. In application, the proposed distribution can provide a better fit than traditional distributions: zero inflated Poisson and zero inflated negative binomial distributions.

  6. Estimating spatial and temporal components of variation in count data using negative binomial mixed models

    Science.gov (United States)

    Irwin, Brian J.; Wagner, Tyler; Bence, James R.; Kepler, Megan V.; Liu, Weihai; Hayes, Daniel B.

    2013-01-01

    Partitioning total variability into its component temporal and spatial sources is a powerful way to better understand time series and elucidate trends. The data available for such analyses of fish and other populations are usually nonnegative integer counts of the number of organisms, often dominated by many low values with few observations of relatively high abundance. These characteristics are not well approximated by the Gaussian distribution. We present a detailed description of a negative binomial mixed-model framework that can be used to model count data and quantify temporal and spatial variability. We applied these models to data from four fishery-independent surveys of Walleyes Sander vitreus across the Great Lakes basin. Specifically, we fitted models to gill-net catches from Wisconsin waters of Lake Superior; Oneida Lake, New York; Saginaw Bay in Lake Huron, Michigan; and Ohio waters of Lake Erie. These long-term monitoring surveys varied in overall sampling intensity, the total catch of Walleyes, and the proportion of zero catches. Parameter estimation included the negative binomial scaling parameter, and we quantified the random effects as the variations among gill-net sampling sites, the variations among sampled years, and site × year interactions. This framework (i.e., the application of a mixed model appropriate for count data in a variance-partitioning context) represents a flexible approach that has implications for monitoring programs (e.g., trend detection) and for examining the potential of individual variance components to serve as response metrics to large-scale anthropogenic perturbations or ecological changes.

  7. The Binomial Coefficient for Negative Arguments

    OpenAIRE

    Kronenburg, M. J.

    2011-01-01

    The definition of the binomial coefficient in terms of gamma functions also allows non-integer arguments. For nonnegative integer arguments the gamma functions reduce to factorials, leading to the well-known Pascal triangle. Using a symmetry formula for the gamma function, this definition is extended to negative integer arguments, making the symmetry identity for binomial coefficients valid for all integer arguments. The agreement of this definition with some other identities and with the bin...

  8. Discovering Binomial Identities with PascGaloisJE

    Science.gov (United States)

    Evans, Tyler J.

    2008-01-01

    We describe exercises in which students use PascGaloisJE to formulate conjectures about certain binomial identities which hold when the binomial coefficients are interpreted as elements in the cyclic group Z[subscript p] of integers modulo a prime integer "p". In addition to having an appealing visual component, these exercises are open-ended and…

  9. Binomial vs poisson statistics in radiation studies

    International Nuclear Information System (INIS)

    Foster, J.; Kouris, K.; Spyrou, N.M.; Matthews, I.P.; Welsh National School of Medicine, Cardiff

    1983-01-01

    The processes of radioactive decay, decay and growth of radioactive species in a radioactive chain, prompt emission(s) from nuclear reactions, conventional activation and cyclic activation are discussed with respect to their underlying statistical density function. By considering the transformation(s) that each nucleus may undergo it is shown that all these processes are fundamentally binomial. Formally, when the number of experiments N is large and the probability of success p is close to zero, the binomial is closely approximated by the Poisson density function. In radiation and nuclear physics, N is always large: each experiment can be conceived of as the observation of the fate of each of the N nuclei initially present. Whether p, the probability that a given nucleus undergoes a prescribed transformation, is close to zero depends on the process and nuclide(s) concerned. Hence, although a binomial description is always valid, the Poisson approximation is not always adequate. Therefore further clarification is provided as to when the binomial distribution must be used in the statistical treatment of detected events. (orig.)

  10. Tomography of binomial states of the radiation field

    NARCIS (Netherlands)

    Bazrafkan, MR; Man'ko, [No Value

    2004-01-01

    The symplectic, optical, and photon-number tomographic symbols of binomial states of the radiation field are studied. Explicit relations for all tomograms of the binomial states are obtained. Two measures for nonclassical properties of these states are discussed.

  11. Application of binomial-edited CPMG to shale characterization.

    Science.gov (United States)

    Washburn, Kathryn E; Birdwell, Justin E

    2014-09-01

    Unconventional shale resources may contain a significant amount of hydrogen in organic solids such as kerogen, but it is not possible to directly detect these solids with many NMR systems. Binomial-edited pulse sequences capitalize on magnetization transfer between solids, semi-solids, and liquids to provide an indirect method of detecting solid organic materials in shales. When the organic solids can be directly measured, binomial-editing helps distinguish between different phases. We applied a binomial-edited CPMG pulse sequence to a range of natural and experimentally-altered shale samples. The most substantial signal loss is seen in shales rich in organic solids while fluids associated with inorganic pores seem essentially unaffected. This suggests that binomial-editing is a potential method for determining fluid locations, solid organic content, and kerogen-bitumen discrimination. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. A class of orthogonal nonrecursive binomial filters.

    Science.gov (United States)

    Haddad, R. A.

    1971-01-01

    The time- and frequency-domain properties of the orthogonal binomial sequences are presented. It is shown that these sequences, or digital filters based on them, can be generated using adders and delay elements only. The frequency-domain behavior of these nonrecursive binomial filters suggests a number of applications as low-pass Gaussian filters or as inexpensive bandpass filters.

  13. A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

    Science.gov (United States)

    Lea, Amanda J.

    2015-01-01

    Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html. PMID:26599596

  14. Currency lookback options and observation frequency: A binomial approach

    NARCIS (Netherlands)

    T.H.F. Cheuk; A.C.F. Vorst (Ton)

    1997-01-01

    textabstractIn the last decade, interest in exotic options has been growing, especially in the over-the-counter currency market. In this paper we consider Iookback currency options, which are path-dependent. We show that a one-state variable binomial model for currency Iookback options can

  15. A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys.

    Directory of Open Access Journals (Sweden)

    James A Wiley

    Full Text Available We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.

  16. Statistical inference for a class of multivariate negative binomial distributions

    DEFF Research Database (Denmark)

    Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter

    This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...

  17. Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

    Directory of Open Access Journals (Sweden)

    Anke Hüls

    2017-05-01

    Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate

  18. Applied Prevalence Ratio estimation with different Regression models: An example from a cross-national study on substance use research.

    Science.gov (United States)

    Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina

    2016-06-14

    To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.

  19. Validity of the negative binomial distribution in particle production

    International Nuclear Information System (INIS)

    Cugnon, J.; Harouna, O.

    1987-01-01

    Some aspects of the clan picture for particle production in nuclear and in high-energy processes are examined. In particular, it is shown that the requirement of having logarithmic distribution for the number of particles within a clan in order to generate a negative binomial should not be taken strictly. Large departures are allowed without distorting too much the negative binomial. The question of the undetected particles is also studied. It is shown that, under reasonable circumstances, the latter do not affect the negative binomial character of the multiplicity distribution

  20. The Compound Binomial Risk Model with Randomly Charging Premiums and Paying Dividends to Shareholders

    Directory of Open Access Journals (Sweden)

    Xiong Wang

    2013-01-01

    Full Text Available Based on characteristics of the nonlife joint-stock insurance company, this paper presents a compound binomial risk model that randomizes the premium income on unit time and sets the threshold for paying dividends to shareholders. In this model, the insurance company obtains the insurance policy in unit time with probability and pays dividends to shareholders with probability when the surplus is no less than . We then derive the recursive formulas of the expected discounted penalty function and the asymptotic estimate for it. And we will derive the recursive formulas and asymptotic estimates for the ruin probability and the distribution function of the deficit at ruin. The numerical examples have been shown to illustrate the accuracy of the asymptotic estimations.

  1. Using the β-binomial distribution to characterize forest health

    Science.gov (United States)

    S.J. Zarnoch; R.L. Anderson; R.M. Sheffield

    1995-01-01

    The β-binomial distribution is suggested as a model for describing and analyzing the dichotomous data obtained from programs monitoring the health of forests in the United States. Maximum likelihood estimation of the parameters is given as well as asymptotic likelihood ratio tests. The procedure is illustrated with data on dogwood anthracnose infection (caused...

  2. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

    Science.gov (United States)

    Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy

    2017-10-01

    Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Generalization of Binomial Coefficients to Numbers on the Nodes of Graphs

    NARCIS (Netherlands)

    Khmelnitskaya, A.; van der Laan, G.; Talman, Dolf

    2016-01-01

    The triangular array of binomial coefficients, or Pascal's triangle, is formed by starting with an apex of 1. Every row of Pascal's triangle can be seen as a line-graph, to each node of which the corresponding binomial coefficient is assigned. We show that the binomial coefficient of a node is equal

  4. Generalization of binomial coefficients to numbers on the nodes of graphs

    NARCIS (Netherlands)

    Khmelnitskaya, Anna Borisovna; van der Laan, Gerard; Talman, Dolf

    The triangular array of binomial coefficients, or Pascal's triangle, is formed by starting with an apex of 1. Every row of Pascal's triangle can be seen as a line-graph, to each node of which the corresponding binomial coefficient is assigned. We show that the binomial coefficient of a node is equal

  5. Comparison of multinomial and binomial proportion methods for analysis of multinomial count data.

    Science.gov (United States)

    Galyean, M L; Wester, D B

    2010-10-01

    Simulation methods were used to generate 1,000 experiments, each with 3 treatments and 10 experimental units/treatment, in completely randomized (CRD) and randomized complete block designs. Data were counts in 3 ordered or 4 nominal categories from multinomial distributions. For the 3-category analyses, category probabilities were 0.6, 0.3, and 0.1, respectively, for 2 of the treatments, and 0.5, 0.35, and 0.15 for the third treatment. In the 4-category analysis (CRD only), probabilities were 0.3, 0.3, 0.2, and 0.2 for treatments 1 and 2 vs. 0.4, 0.4, 0.1, and 0.1 for treatment 3. The 3-category data were analyzed with generalized linear mixed models as an ordered multinomial distribution with a cumulative logit link or by regrouping the data (e.g., counts in 1 category/sum of counts in all categories), followed by analysis of single categories as binomial proportions. Similarly, the 4-category data were analyzed as a nominal multinomial distribution with a glogit link or by grouping data as binomial proportions. For the 3-category CRD analyses, empirically determined type I error rates based on pair-wise comparisons (F- and Wald chi(2) tests) did not differ between multinomial and individual binomial category analyses with 10 (P = 0.38 to 0.60) or 50 (P = 0.19 to 0.67) sampling units/experimental unit. When analyzed as binomial proportions, power estimates varied among categories, with analysis of the category with the greatest counts yielding power similar to the multinomial analysis. Agreement between methods (percentage of experiments with the same results for the overall test for treatment effects) varied considerably among categories analyzed and sampling unit scenarios for the 3-category CRD analyses. Power (F-test) was 24.3, 49.1, 66.9, 83.5, 86.8, and 99.7% for 10, 20, 30, 40, 50, and 100 sampling units/experimental unit for the 3-category multinomial CRD analyses. Results with randomized complete block design simulations were similar to those with the CRD

  6. Assessing outcomes of large-scale public health interventions in the absence of baseline data using a mixture of Cox and binomial regressions

    Science.gov (United States)

    2014-01-01

    Background Large-scale public health interventions with rapid scale-up are increasingly being implemented worldwide. Such implementation allows for a large target population to be reached in a short period of time. But when the time comes to investigate the effectiveness of these interventions, the rapid scale-up creates several methodological challenges, such as the lack of baseline data and the absence of control groups. One example of such an intervention is Avahan, the India HIV/AIDS initiative of the Bill & Melinda Gates Foundation. One question of interest is the effect of Avahan on condom use by female sex workers with their clients. By retrospectively reconstructing condom use and sex work history from survey data, it is possible to estimate how condom use rates evolve over time. However formal inference about how this rate changes at a given point in calendar time remains challenging. Methods We propose a new statistical procedure based on a mixture of binomial regression and Cox regression. We compare this new method to an existing approach based on generalized estimating equations through simulations and application to Indian data. Results Both methods are unbiased, but the proposed method is more powerful than the existing method, especially when initial condom use is high. When applied to the Indian data, the new method mostly agrees with the existing method, but seems to have corrected some implausible results of the latter in a few districts. We also show how the new method can be used to analyze the data of all districts combined. Conclusions The use of both methods can be recommended for exploratory data analysis. However for formal statistical inference, the new method has better power. PMID:24397563

  7. Smoothness in Binomial Edge Ideals

    Directory of Open Access Journals (Sweden)

    Hamid Damadi

    2016-06-01

    Full Text Available In this paper we study some geometric properties of the algebraic set associated to the binomial edge ideal of a graph. We study the singularity and smoothness of the algebraic set associated to the binomial edge ideal of a graph. Some of these algebraic sets are irreducible and some of them are reducible. If every irreducible component of the algebraic set is smooth we call the graph an edge smooth graph, otherwise it is called an edge singular graph. We show that complete graphs are edge smooth and introduce two conditions such that the graph G is edge singular if and only if it satisfies these conditions. Then, it is shown that cycles and most of trees are edge singular. In addition, it is proved that complete bipartite graphs are edge smooth.

  8. Linking the Negative Binomial and Logarithmic Series Distributions via their Associated Series

    OpenAIRE

    SADINLE, MAURICIO

    2008-01-01

    The negative binomial distribution is associated to the series obtained by taking derivatives of the logarithmic series. Conversely, the logarithmic series distribution is associated to the series found by integrating the series associated to the negative binomial distribution. The parameter of the number of failures of the negative binomial distribution is the number of derivatives needed to obtain the negative binomial series from the logarithmic series. The reasoning in this article could ...

  9. Measured PET Data Characterization with the Negative Binomial Distribution Model.

    Science.gov (United States)

    Santarelli, Maria Filomena; Positano, Vincenzo; Landini, Luigi

    2017-01-01

    Accurate statistical model of PET measurements is a prerequisite for a correct image reconstruction when using statistical image reconstruction algorithms, or when pre-filtering operations must be performed. Although radioactive decay follows a Poisson distribution, deviation from Poisson statistics occurs on projection data prior to reconstruction due to physical effects, measurement errors, correction of scatter and random coincidences. Modelling projection data can aid in understanding the statistical nature of the data in order to develop efficient processing methods and to reduce noise. This paper outlines the statistical behaviour of measured emission data evaluating the goodness of fit of the negative binomial (NB) distribution model to PET data for a wide range of emission activity values. An NB distribution model is characterized by the mean of the data and the dispersion parameter α that describes the deviation from Poisson statistics. Monte Carlo simulations were performed to evaluate: (a) the performances of the dispersion parameter α estimator, (b) the goodness of fit of the NB model for a wide range of activity values. We focused on the effect produced by correction for random and scatter events in the projection (sinogram) domain, due to their importance in quantitative analysis of PET data. The analysis developed herein allowed us to assess the accuracy of the NB distribution model to fit corrected sinogram data, and to evaluate the sensitivity of the dispersion parameter α to quantify deviation from Poisson statistics. By the sinogram ROI-based analysis, it was demonstrated that deviation on the measured data from Poisson statistics can be quantitatively characterized by the dispersion parameter α, in any noise conditions and corrections.

  10. Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model.

    Science.gov (United States)

    Islam, Mohammad Mafijul; Alam, Morshed; Tariquzaman, Md; Kabir, Mohammad Alamgir; Pervin, Rokhsona; Begum, Munni; Khan, Md Mobarak Hossain

    2013-01-08

    Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance variable namely mother's education, father's education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh.

  11. The Binomial Distribution in Shooting

    Science.gov (United States)

    Chalikias, Miltiadis S.

    2009-01-01

    The binomial distribution is used to predict the winner of the 49th International Shooting Sport Federation World Championship in double trap shooting held in 2006 in Zagreb, Croatia. The outcome of the competition was definitely unexpected.

  12. Application of a random effects negative binomial model to examine tram-involved crash frequency on route sections in Melbourne, Australia.

    Science.gov (United States)

    Naznin, Farhana; Currie, Graham; Logan, David; Sarvi, Majid

    2016-07-01

    Safety is a key concern in the design, operation and development of light rail systems including trams or streetcars as they impose crash risks on road users in terms of crash frequency and severity. The aim of this study is to identify key traffic, transit and route factors that influence tram-involved crash frequencies along tram route sections in Melbourne. A random effects negative binomial (RENB) regression model was developed to analyze crash frequency data obtained from Yarra Trams, the tram operator in Melbourne. The RENB modelling approach can account for spatial and temporal variations within observation groups in panel count data structures by assuming that group specific effects are randomly distributed across locations. The results identify many significant factors effecting tram-involved crash frequency including tram service frequency (2.71), tram stop spacing (-0.42), tram route section length (0.31), tram signal priority (-0.25), general traffic volume (0.18), tram lane priority (-0.15) and ratio of platform tram stops (-0.09). Findings provide useful insights on route section level tram-involved crashes in an urban tram or streetcar operating environment. The method described represents a useful planning tool for transit agencies hoping to improve safety performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Comparison and Field Validation of Binomial Sampling Plans for Oligonychus perseae (Acari: Tetranychidae) on Hass Avocado in Southern California.

    Science.gov (United States)

    Lara, Jesus R; Hoddle, Mark S

    2015-08-01

    Oligonychus perseae Tuttle, Baker, & Abatiello is a foliar pest of 'Hass' avocados [Persea americana Miller (Lauraceae)]. The recommended action threshold is 50-100 motile mites per leaf, but this count range and other ecological factors associated with O. perseae infestations limit the application of enumerative sampling plans in the field. Consequently, a comprehensive modeling approach was implemented to compare the practical application of various binomial sampling models for decision-making of O. perseae in California. An initial set of sequential binomial sampling models were developed using three mean-proportion modeling techniques (i.e., Taylor's power law, maximum likelihood, and an empirical model) in combination with two-leaf infestation tally thresholds of either one or two mites. Model performance was evaluated using a robust mite count database consisting of >20,000 Hass avocado leaves infested with varying densities of O. perseae and collected from multiple locations. Operating characteristic and average sample number results for sequential binomial models were used as the basis to develop and validate a standardized fixed-size binomial sampling model with guidelines on sample tree and leaf selection within blocks of avocado trees. This final validated model requires a leaf sampling cost of 30 leaves and takes into account the spatial dynamics of O. perseae to make reliable mite density classifications for a 50-mite action threshold. Recommendations for implementing this fixed-size binomial sampling plan to assess densities of O. perseae in commercial California avocado orchards are discussed. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. CUMBIN - CUMULATIVE BINOMIAL PROGRAMS

    Science.gov (United States)

    Bowerman, P. N.

    1994-01-01

    The cumulative binomial program, CUMBIN, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, CUMBIN, NEWTONP (NPO-17556), and CROSSER (NPO-17557), can be used independently of one another. CUMBIN can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. CUMBIN calculates the probability that a system of n components has at least k operating if the probability that any one operating is p and the components are independent. Equivalently, this is the reliability of a k-out-of-n system having independent components with common reliability p. CUMBIN can evaluate the incomplete beta distribution for two positive integer arguments. CUMBIN can also evaluate the cumulative F distribution and the negative binomial distribution, and can determine the sample size in a test design. CUMBIN is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. The program is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. The CUMBIN program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CUMBIN was developed in 1988.

  15. A comparison of various modelling approaches applied to Cholera ...

    African Journals Online (AJOL)

    The analyses are demonstrated on data collected from Beira, Mozambique. Dynamic regression was found to be the preferred forecasting method for this data set. Keywords:Cholera, modelling, signal processing, dynamic regression, negative binomial regression, wavelet analysis, cross-wavelet analysis. ORiON Vol.

  16. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  17. Spiritual and ceremonial plants in North America: an assessment of Moerman's ethnobotanical database comparing Residual, Binomial, Bayesian and Imprecise Dirichlet Model (IDM) analysis.

    Science.gov (United States)

    Turi, Christina E; Murch, Susan J

    2013-07-09

    Ethnobotanical research and the study of plants used for rituals, ceremonies and to connect with the spirit world have led to the discovery of many novel psychoactive compounds such as nicotine, caffeine, and cocaine. In North America, spiritual and ceremonial uses of plants are well documented and can be accessed online via the University of Michigan's Native American Ethnobotany Database. The objective of the study was to compare Residual, Bayesian, Binomial and Imprecise Dirichlet Model (IDM) analyses of ritual, ceremonial and spiritual plants in Moerman's ethnobotanical database and to identify genera that may be good candidates for the discovery of novel psychoactive compounds. The database was queried with the following format "Family Name AND Ceremonial OR Spiritual" for 263 North American botanical families. Spiritual and ceremonial flora consisted of 86 families with 517 species belonging to 292 genera. Spiritual taxa were then grouped further into ceremonial medicines and items categories. Residual, Bayesian, Binomial and IDM analysis were performed to identify over and under-utilized families. The 4 statistical approaches were in good agreement when identifying under-utilized families but large families (>393 species) were underemphasized by Binomial, Bayesian and IDM approaches for over-utilization. Residual, Binomial, and IDM analysis identified similar families as over-utilized in the medium (92-392 species) and small (<92 species) classes. The families Apiaceae, Asteraceae, Ericacea, Pinaceae and Salicaceae were identified as significantly over-utilized as ceremonial medicines in medium and large sized families. Analysis of genera within the Apiaceae and Asteraceae suggest that the genus Ligusticum and Artemisia are good candidates for facilitating the discovery of novel psychoactive compounds. The 4 statistical approaches were not consistent in the selection of over-utilization of flora. Residual analysis revealed overall trends that were supported

  18. Hits per trial: Basic analysis of binomial data

    International Nuclear Information System (INIS)

    Atwood, C.L.

    1994-09-01

    This report presents simple statistical methods for analyzing binomial data, such as the number of failures in some number of demands. It gives point estimates, confidence intervals, and Bayesian intervals for the failure probability. It shows how to compare subsets of the data, both graphically and by statistical tests, and how to look for trends in time. It presents a compound model when the failure probability varies randomly. Examples and SAS programs are given

  19. Hits per trial: Basic analysis of binomial data

    Energy Technology Data Exchange (ETDEWEB)

    Atwood, C.L.

    1994-09-01

    This report presents simple statistical methods for analyzing binomial data, such as the number of failures in some number of demands. It gives point estimates, confidence intervals, and Bayesian intervals for the failure probability. It shows how to compare subsets of the data, both graphically and by statistical tests, and how to look for trends in time. It presents a compound model when the failure probability varies randomly. Examples and SAS programs are given.

  20. Regression Models for Market-Shares

    DEFF Research Database (Denmark)

    Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue

    2005-01-01

    On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....

  1. Wigner Function of Density Operator for Negative Binomial Distribution

    International Nuclear Information System (INIS)

    Xu Xinglei; Li Hongqi

    2008-01-01

    By using the technique of integration within an ordered product (IWOP) of operator we derive Wigner function of density operator for negative binomial distribution of radiation field in the mixed state case, then we derive the Wigner function of squeezed number state, which yields negative binomial distribution by virtue of the entangled state representation and the entangled Wigner operator

  2. Use of the negative binomial-truncated Poisson distribution in thunderstorm prediction

    Science.gov (United States)

    Cohen, A. C.

    1971-01-01

    A probability model is presented for the distribution of thunderstorms over a small area given that thunderstorm events (1 or more thunderstorms) are occurring over a larger area. The model incorporates the negative binomial and truncated Poisson distributions. Probability tables for Cape Kennedy for spring, summer, and fall months and seasons are presented. The computer program used to compute these probabilities is appended.

  3. Interrelationships Between Receiver/Relative Operating Characteristics Display, Binomial, Logit, and Bayes' Rule Probability of Detection Methodologies

    Science.gov (United States)

    Generazio, Edward R.

    2014-01-01

    Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.

  4. e+-e- hadronic multiplicity distributions: negative binomial or Poisson

    International Nuclear Information System (INIS)

    Carruthers, P.; Shih, C.C.

    1986-01-01

    On the basis of fits to the multiplicity distributions for variable rapidity windows and the forward backward correlation for the 2 jet subset of e + e - data it is impossible to distinguish between a global negative binomial and its generalization, the partially coherent distribution. It is suggested that intensity interferometry, especially the Bose-Einstein correlation, gives information which will discriminate among dynamical models. 16 refs

  5. Newton Binomial Formulas in Schubert Calculus

    OpenAIRE

    Cordovez, Jorge; Gatto, Letterio; Santiago, Taise

    2008-01-01

    We prove Newton's binomial formulas for Schubert Calculus to determine numbers of base point free linear series on the projective line with prescribed ramification divisor supported at given distinct points.

  6. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  7. Genome-enabled predictions for binomial traits in sugar beet populations.

    Science.gov (United States)

    Biscarini, Filippo; Stevanato, Piergiorgio; Broccanello, Chiara; Stella, Alessandra; Saccomani, Massimo

    2014-07-22

    Genomic information can be used to predict not only continuous but also categorical (e.g. binomial) traits. Several traits of interest in human medicine and agriculture present a discrete distribution of phenotypes (e.g. disease status). Root vigor in sugar beet (B. vulgaris) is an example of binomial trait of agronomic importance. In this paper, a panel of 192 SNPs (single nucleotide polymorphisms) was used to genotype 124 sugar beet individual plants from 18 lines, and to classify them as showing "high" or "low" root vigor. A threshold model was used to fit the relationship between binomial root vigor and SNP genotypes, through the matrix of genomic relationships between individuals in a genomic BLUP (G-BLUP) approach. From a 5-fold cross-validation scheme, 500 testing subsets were generated. The estimated average cross-validation error rate was 0.000731 (0.073%). Only 9 out of 12326 test observations (500 replicates for an average test set size of 24.65) were misclassified. The estimated prediction accuracy was quite high. Such accurate predictions may be related to the high estimated heritability for root vigor (0.783) and to the few genes with large effect underlying the trait. Despite the sparse SNP panel, there was sufficient within-scaffold LD where SNPs with large effect on root vigor were located to allow for genome-enabled predictions to work.

  8. Logistic quantile regression provides improved estimates for bounded avian counts: a case study of California Spotted Owl fledgling production

    Science.gov (United States)

    Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...

  9. On some binomial [Formula: see text]-difference sequence spaces.

    Science.gov (United States)

    Meng, Jian; Song, Meimei

    2017-01-01

    In this paper, we introduce the binomial sequence spaces [Formula: see text], [Formula: see text] and [Formula: see text] by combining the binomial transformation and difference operator. We prove the BK -property and some inclusion relations. Furthermore, we obtain Schauder bases and compute the α -, β - and γ -duals of these sequence spaces. Finally, we characterize matrix transformations on the sequence space [Formula: see text].

  10. A spatially-explicit count data regression for modeling the density of forest cockchafer (Melolontha hippocastani larvae in the Hessian Ried (Germany

    Directory of Open Access Journals (Sweden)

    Matthias Schmidt

    2014-10-01

    Full Text Available Background In this paper, a regression model for predicting the spatial distribution of forest cockchafer larvae in the Hessian Ried region (Germany is presented. The forest cockchafer, a native biotic pest, is a major cause of damage in forests in this region particularly during the regeneration phase. The model developed in this study is based on a systematic sample inventory of forest cockchafer larvae by excavation across the Hessian Ried. These forest cockchafer larvae data were characterized by excess zeros and overdispersion. Methods Using specific generalized additive regression models, different discrete distributions, including the Poisson, negative binomial and zero-inflated Poisson distributions, were compared. The methodology employed allowed the simultaneous estimation of non-linear model effects of causal covariates and, to account for spatial autocorrelation, of a 2-dimensional spatial trend function. In the validation of the models, both the Akaike information criterion (AIC and more detailed graphical procedures based on randomized quantile residuals were used. Results The negative binomial distribution was superior to the Poisson and the zero-inflated Poisson distributions, providing a near perfect fit to the data, which was proven in an extensive validation process. The causal predictors found to affect the density of larvae significantly were distance to water table and percentage of pure clay layer in the soil to a depth of 1 m. Model predictions showed that larva density increased with an increase in distance to the water table up to almost 4 m, after which it remained constant, and with a reduction in the percentage of pure clay layer. However this latter correlation was weak and requires further investigation. The 2-dimensional trend function indicated a strong spatial effect, and thus explained by far the highest proportion of variation in larva density. Conclusions As such the model can be used to support forest

  11. On the revival of the negative binomial distribution in multiparticle production

    International Nuclear Information System (INIS)

    Ekspong, G.

    1990-01-01

    This paper is based on published and some unpublished material pertaining to the revival of interest in and success of applying the negative binomial distribution to multiparticle production since 1983. After a historically oriented introduction going farther back in time, the main part of the paper is devoted to an unpublished derivation of the negative binomial distribution based on empirical observations of forward-backward multiplicity correlations. Some physical processes leading to the negative binomial distribution are mentioned and some comments made on published criticisms

  12. Panel Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    González, Andrés; Terasvirta, Timo; Dijk, Dick van

    We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...

  13. A fast algorithm for computing binomial coefficients modulo powers of two.

    Science.gov (United States)

    Andreica, Mugurel Ionut

    2013-01-01

    I present a new algorithm for computing binomial coefficients modulo 2N. The proposed method has an O(N3·Multiplication(N)+N4) preprocessing time, after which a binomial coefficient C(P, Q) with 0≤Q≤P≤2N-1 can be computed modulo 2N in O(N2·log(N)·Multiplication(N)) time. Multiplication(N) denotes the time complexity of multiplying two N-bit numbers, which can range from O(N2) to O(N·log(N)·log(log(N))) or better. Thus, the overall time complexity for evaluating M binomial coefficients C(P, Q) modulo 2N with 0≤Q≤P≤2N-1 is O((N3+M·N2·log(N))·Multiplication(N)+N4). After preprocessing, we can actually compute binomial coefficients modulo any 2R with R≤N. For larger values of P and Q, variations of Lucas' theorem must be used first in order to reduce the computation to the evaluation of multiple (O(log(P))) binomial coefficients C(P', Q') (or restricted types of factorials P'!) modulo 2N with 0≤Q'≤P'≤2N-1.

  14. Self-similarity of the negative binomial multiplicity distributions

    International Nuclear Information System (INIS)

    Calucci, G.; Treleani, D.

    1998-01-01

    The negative binomial distribution is self-similar: If the spectrum over the whole rapidity range gives rise to a negative binomial, in the absence of correlation and if the source is unique, also a partial range in rapidity gives rise to the same distribution. The property is not seen in experimental data, which are rather consistent with the presence of a number of independent sources. When multiplicities are very large, self-similarity might be used to isolate individual sources in a complex production process. copyright 1997 The American Physical Society

  15. Hadronic multiplicity distributions: the negative binomial and its alternatives

    International Nuclear Information System (INIS)

    Carruthers, P.

    1986-01-01

    We review properties of the negative binomial distribution, along with its many possible statistical or dynamical origins. Considering the relation of the multiplicity distribution to the density matrix for Boson systems, we re-introduce the partially coherent laser distribution, which allows for coherent as well as incoherent hadronic emission from the k fundamental cells, and provides equally good phenomenological fits to existing data. The broadening of non-single diffractive hadron-hadron distributions can be equally well due to the decrease of coherent with increasing energy as to the large (and rapidly decreasing) values of k deduced from negative binomial fits. Similarly the narrowness of e + -e - multiplicity distribution is due to nearly coherent (therefore nearly Poissonian) emission from a small number of jets, in contrast to the negative binomial with enormous values of k. 31 refs

  16. Hadronic multiplicity distributions: the negative binomial and its alternatives

    International Nuclear Information System (INIS)

    Carruthers, P.

    1986-01-01

    We review properties of the negative binomial distribution, along with its many possible statistical or dynamical origins. Considering the relation of the multiplicity distribution to the density matrix for boson systems, we re-introduce the partially coherent laser distribution, which allows for coherent as well as incoherent hadronic emission from the k fundamental cells, and provides equally good phenomenological fits to existing data. The broadening of non-single diffractive hadron-hadron distributions can be equally well due to the decrease of coherence with increasing energy as to the large (and rapidly decreasing) values of k deduced from negative binomial fits. Similarly the narrowness of e + -e - multiplicity distribution is due to nearly coherent (therefore nearly Poissonian) emission from a small number of jets, in contrast to the negative binomial with enormous values of k. 31 refs

  17. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  18. Entanglement of Generalized Two-Mode Binomial States and Teleportation

    International Nuclear Information System (INIS)

    Wang Dongmei; Yu Youhong

    2009-01-01

    The entanglement of the generalized two-mode binomial states in the phase damping channel is studied by making use of the relative entropy of the entanglement. It is shown that the factors of q and p play the crucial roles in control the relative entropy of the entanglement. Furthermore, we propose a scheme of teleporting an unknown state via the generalized two-mode binomial states, and calculate the mean fidelity of the scheme. (general)

  19. Negative binomial properties and clan structure in multiplicity distributions

    International Nuclear Information System (INIS)

    Giovannini, A.; Van Hove, L.

    1988-01-01

    We review the negative binomial properties measured recently for many multiplicity distributions of high energy hadronic, semi-leptonic reactions in selected rapidity intervals. We analyse them in terms of the ''clan'' structure which can be defined for any negative binomial distribution. By comparing reactions we exhibit a number of regularities for the average number N-bar of clans and the average charged multiplicity (n-bar) c per clan. 22 refs., 6 figs. (author)

  20. Optimized Binomial Quantum States of Complex Oscillators with Real Spectrum

    International Nuclear Information System (INIS)

    Zelaya, K D; Rosas-Ortiz, O

    2016-01-01

    Classical and nonclassical states of quantum complex oscillators with real spectrum are presented. Such states are bi-orthonormal superpositions of n +1 energy eigenvectors of the system with binomial-like coefficients. For large values of n these optimized binomial states behave as photon added coherent states when the imaginary part of the potential is cancelled. (paper)

  1. The multi-class binomial failure rate model for the treatment of common-cause failures

    International Nuclear Information System (INIS)

    Hauptmanns, U.

    1995-01-01

    The impact of common cause failures (CCF) on PSA results for NPPs is in sharp contrast with the limited quality which can be achieved in their assessment. This is due to the dearth of observations and cannot be remedied in the short run. Therefore the methods employed for calculating failure rates should be devised such as to make the best use of the few available observations on CCF. The Multi-Class Binomial Failure Rate (MCBFR) Model achieves this by assigning observed failures to different classes according to their technical characteristics and applying the BFR formalism to each of these. The results are hence determined by a superposition of BFR type expressions for each class, each of them with its own coupling factor. The model thus obtained flexibly reproduces the dependence of CCF rates on failure multiplicity insinuated by the observed failure multiplicities. This is demonstrated by evaluating CCFs observed for combined impulse pilot valves in German NPPs. (orig.) [de

  2. Road Fatality Model Based on Over-Dispersion Data Along Federal Route F0050

    Directory of Open Access Journals (Sweden)

    Musa Wan Zahidah

    2017-01-01

    Full Text Available According to The World Health Ranking 2011 has ranked Malaysia as 20th in its list of countries with the most deaths caused by road accidents. Road accidents also have been identified as the prime cause of death in Malaysia after the heart disease, stroke, influenza and pneumonia. To date, previous researches from Malaysian Institute of Road Safety (MIROS have reported that averages of 18 people were killed on Malaysian road daily. There are many kinds of models that have been developed in modelling the circumstance of accidents. The most widely applied was Poisson and Negative Binomial regression models while Zeroinflated Poisson and Zero-inflated Negative Binomial are the modification of Poisson and Negative Binomials regression models. This study interested to focus on road F0050 as statistic data from Royal Malaysian Police 2014 list F0050 as one of the high accident road in Malaysia from kilometre 0 until kilometre 58. R programming was chose to analyse the relationship between road fatality and its factor (annual average daily traffic (AADT, speed, shoulder width, lane width. Negative binomial and Zero-inflated negative binomial (ZINB were shown to be preferred modelling methods for this study. Significant positive relationships were also identified between road fatality and annual average daily traffic (AADT and lane width. This relationship can be a helpful support to the decision making of accident management for road F0050.

  3. Estimation of component failure probability from masked binomial system testing data

    International Nuclear Information System (INIS)

    Tan Zhibin

    2005-01-01

    The component failure probability estimates from analysis of binomial system testing data are very useful because they reflect the operational failure probability of components in the field which is similar to the test environment. In practice, this type of analysis is often confounded by the problem of data masking: the status of tested components is unknown. Methods in considering this type of uncertainty are usually computationally intensive and not practical to solve the problem for complex systems. In this paper, we consider masked binomial system testing data and develop a probabilistic model to efficiently estimate component failure probabilities. In the model, all system tests are classified into test categories based on component coverage. Component coverage of test categories is modeled by a bipartite graph. Test category failure probabilities conditional on the status of covered components are defined. An EM algorithm to estimate component failure probabilities is developed based on a simple but powerful concept: equivalent failures and tests. By simulation we not only demonstrate the convergence and accuracy of the algorithm but also show that the probabilistic model is capable of analyzing systems in series, parallel and any other user defined structures. A case study illustrates an application in test case prioritization

  4. Development of planning level transportation safety tools using Geographically Weighted Poisson Regression.

    Science.gov (United States)

    Hadayeghi, Alireza; Shalaby, Amer S; Persaud, Bhagwant N

    2010-03-01

    A common technique used for the calibration of collision prediction models is the Generalized Linear Modeling (GLM) procedure with the assumption of Negative Binomial or Poisson error distribution. In this technique, fixed coefficients that represent the average relationship between the dependent variable and each explanatory variable are estimated. However, the stationary relationship assumed may hide some important spatial factors of the number of collisions at a particular traffic analysis zone. Consequently, the accuracy of such models for explaining the relationship between the dependent variable and the explanatory variables may be suspected since collision frequency is likely influenced by many spatially defined factors such as land use, demographic characteristics, and traffic volume patterns. The primary objective of this study is to investigate the spatial variations in the relationship between the number of zonal collisions and potential transportation planning predictors, using the Geographically Weighted Poisson Regression modeling technique. The secondary objective is to build on knowledge comparing the accuracy of Geographically Weighted Poisson Regression models to that of Generalized Linear Models. The results show that the Geographically Weighted Poisson Regression models are useful for capturing spatially dependent relationships and generally perform better than the conventional Generalized Linear Models. Copyright 2009 Elsevier Ltd. All rights reserved.

  5. Integer Solutions of Binomial Coefficients

    Science.gov (United States)

    Gilbertson, Nicholas J.

    2016-01-01

    A good formula is like a good story, rich in description, powerful in communication, and eye-opening to readers. The formula presented in this article for determining the coefficients of the binomial expansion of (x + y)n is one such "good read." The beauty of this formula is in its simplicity--both describing a quantitative situation…

  6. Predicting company growth using logistic regression and neural networks

    Directory of Open Access Journals (Sweden)

    Marijana Zekić-Sušac

    2016-12-01

    Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.

  7. Poisson Mixture Regression Models for Heart Disease Prediction.

    Science.gov (United States)

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  8. Poisson Mixture Regression Models for Heart Disease Prediction

    Science.gov (United States)

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  9. Water-selective excitation of short T2 species with binomial pulses.

    Science.gov (United States)

    Deligianni, Xeni; Bär, Peter; Scheffler, Klaus; Trattnig, Siegfried; Bieri, Oliver

    2014-09-01

    For imaging of fibrous musculoskeletal components, ultra-short echo time methods are often combined with fat suppression. Due to the increased chemical shift, spectral excitation of water might become a favorable option at ultra-high fields. Thus, this study aims to compare and explore short binomial excitation schemes for spectrally selective imaging of fibrous tissue components with short transverse relaxation time (T2 ). Water selective 1-1-binomial excitation is compared with nonselective imaging using a sub-millisecond spoiled gradient echo technique for in vivo imaging of fibrous tissue at 3T and 7T. Simulations indicate a maximum signal loss from binomial excitation of approximately 30% in the limit of very short T2 (0.1 ms), as compared to nonselective imaging; decreasing rapidly with increasing field strength and increasing T2 , e.g., to 19% at 3T and 10% at 7T for T2 of 1 ms. In agreement with simulations, a binomial phase close to 90° yielded minimum signal loss: approximately 6% at 3T and close to 0% at 7T for menisci, and for ligaments 9% and 13%, respectively. Overall, for imaging of short-lived T2 components, short 1-1 binomial excitation schemes prove to offer marginal signal loss especially at ultra-high fields with overall improved scanning efficiency. Copyright © 2013 Wiley Periodicals, Inc.

  10. Binomial model for measuring expected credit losses from trade receivables in non-financial sector entities

    Directory of Open Access Journals (Sweden)

    Branka Remenarić

    2018-01-01

    Full Text Available In July 2014, the International Accounting Standards Board (IASB published International Financial Reporting Standard 9 Financial Instruments (IFRS 9. This standard introduces an expected credit loss (ECL impairment model that applies to financial instruments, including trade and lease receivables. IFRS 9 applies to annual periods beginning on or after 1 January 2018 in the European Union member states. While the main reason for amending the current model was to require major banks to recognize losses in advance of a credit event occurring, this new model also applies to all receivables, including trade receivables, lease receivables, related party loan receivables in non-financial sector entities. The new impairment model is intended to result in earlier recognition of credit losses. The previous model described in International Accounting Standard 39 Financial instruments (IAS 39 was based on incurred losses. One of the major questions now is what models to use to predict expected credit losses in non-financial sector entities. The purpose of this paper is to research the application of the current impairment model, the extent to which the current impairment model can be modified to satisfy new impairment model requirements and the applicability of the binomial model for measuring expected credit losses from accounts receivable.

  11. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  12. Assessing the Option to Abandon an Investment Project by the Binomial Options Pricing Model

    Directory of Open Access Journals (Sweden)

    Salvador Cruz Rambaud

    2016-01-01

    Full Text Available Usually, traditional methods for investment project appraisal such as the net present value (hereinafter NPV do not incorporate in their values the operational flexibility offered by including a real option included in the project. In this paper, real options, and more specifically the option to abandon, are analysed as a complement to cash flow sequence which quantifies the project. In this way, by considering the existing analogy with financial options, a mathematical expression is derived by using the binomial options pricing model. This methodology provides the value of the option to abandon the project within one, two, and in general n periods. Therefore, this paper aims to be a useful tool in determining the value of the option to abandon according to its residual value, thus making easier the control of the uncertainty element within the project.

  13. Determining order-up-to levels under periodic review for compound binomial (intermittent) demand

    NARCIS (Netherlands)

    Teunter, R. H.; Syntetos, A. A.; Babai, M. Z.

    2010-01-01

    We propose a new method for determining order-up-to levels for intermittent demand items in a periodic review system. Contrary to existing methods, we exploit the intermittent character of demand by modelling lead time demand as a compound binomial process. in an extensive numerical study using

  14. An Additive-Multiplicative Cox-Aalen Regression Model

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  15. Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data.

    Science.gov (United States)

    León-Novelo, Luis; Fuentes, Claudio; Emerson, Sarah

    2017-10-01

    RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  17. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  18. Binomial distribution for the charge asymmetry parameter

    International Nuclear Information System (INIS)

    Chou, T.T.; Yang, C.N.

    1984-01-01

    It is suggested that for high energy collisions the distribution with respect to the charge asymmetry z = nsub(F) - nsub(B) is binomial, where nsub(F) and nsub(B) are the forward and backward charge multiplicities. (orig.)

  19. Application of Binomial Model and Market Asset Declaimer Methodology for Valuation of Abandon and Expand Options. The Case Study

    Directory of Open Access Journals (Sweden)

    Paweł Mielcarz

    2007-06-01

    Full Text Available The article presents a case study of valuation of real options included in a investment project. The main goal of the article is to present the calculation and methodological issues of application the methodology for real option valuation. In order to do it there are used the binomial model and Market Asset Declaimer methodology. The project presented in the article concerns the introduction of radio station to a new market. It includes two valuable real options: to abandon the project and to expand.

  20. CROSSER - CUMULATIVE BINOMIAL PROGRAMS

    Science.gov (United States)

    Bowerman, P. N.

    1994-01-01

    The cumulative binomial program, CROSSER, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, CROSSER, CUMBIN (NPO-17555), and NEWTONP (NPO-17556), can be used independently of one another. CROSSER can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. CROSSER calculates the point at which the reliability of a k-out-of-n system equals the common reliability of the n components. It is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. The program is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. It also lists the number of iterations of Newton's method required to calculate the answer within the given error. The CROSSER program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CROSSER was developed in 1988.

  1. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  2. Stochastic analysis of complex reaction networks using binomial moment equations.

    Science.gov (United States)

    Barzel, Baruch; Biham, Ofer

    2012-09-01

    The stochastic analysis of complex reaction networks is a difficult problem because the number of microscopic states in such systems increases exponentially with the number of reactive species. Direct integration of the master equation is thus infeasible and is most often replaced by Monte Carlo simulations. While Monte Carlo simulations are a highly effective tool, equation-based formulations are more amenable to analytical treatment and may provide deeper insight into the dynamics of the network. Here, we present a highly efficient equation-based method for the analysis of stochastic reaction networks. The method is based on the recently introduced binomial moment equations [Barzel and Biham, Phys. Rev. Lett. 106, 150602 (2011)]. The binomial moments are linear combinations of the ordinary moments of the probability distribution function of the population sizes of the interacting species. They capture the essential combinatorics of the reaction processes reflecting their stoichiometric structure. This leads to a simple and transparent form of the equations, and allows a highly efficient and surprisingly simple truncation scheme. Unlike ordinary moment equations, in which the inclusion of high order moments is prohibitively complicated, the binomial moment equations can be easily constructed up to any desired order. The result is a set of equations that enables the stochastic analysis of complex reaction networks under a broad range of conditions. The number of equations is dramatically reduced from the exponential proliferation of the master equation to a polynomial (and often quadratic) dependence on the number of reactive species in the binomial moment equations. The aim of this paper is twofold: to present a complete derivation of the binomial moment equations; to demonstrate the applicability of the moment equations for a representative set of example networks, in which stochastic effects play an important role.

  3. Poisson and negative binomial item count techniques for surveys with sensitive question.

    Science.gov (United States)

    Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin

    2017-04-01

    Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.

  4. Categorical regression dose-response modeling

    Science.gov (United States)

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  5. Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions.

    Science.gov (United States)

    Noirhomme, Quentin; Lesenfants, Damien; Gomez, Francisco; Soddu, Andrea; Schrouff, Jessica; Garraux, Gaëtan; Luxen, André; Phillips, Christophe; Laureys, Steven

    2014-01-01

    Multivariate classification is used in neuroimaging studies to infer brain activation or in medical applications to infer diagnosis. Their results are often assessed through either a binomial or a permutation test. Here, we simulated classification results of generated random data to assess the influence of the cross-validation scheme on the significance of results. Distributions built from classification of random data with cross-validation did not follow the binomial distribution. The binomial test is therefore not adapted. On the contrary, the permutation test was unaffected by the cross-validation scheme. The influence of the cross-validation was further illustrated on real-data from a brain-computer interface experiment in patients with disorders of consciousness and from an fMRI study on patients with Parkinson disease. Three out of 16 patients with disorders of consciousness had significant accuracy on binomial testing, but only one showed significant accuracy using permutation testing. In the fMRI experiment, the mental imagery of gait could discriminate significantly between idiopathic Parkinson's disease patients and healthy subjects according to the permutation test but not according to the binomial test. Hence, binomial testing could lead to biased estimation of significance and false positive or negative results. In our view, permutation testing is thus recommended for clinical application of classification with cross-validation.

  6. Mixed-effects regression models in linguistics

    CERN Document Server

    Heylen, Kris; Geeraerts, Dirk

    2018-01-01

    When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed.  In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...

  7. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  8. Analysis of generalized negative binomial distributions attached to hyperbolic Landau levels

    International Nuclear Information System (INIS)

    Chhaiba, Hassan; Demni, Nizar; Mouayn, Zouhair

    2016-01-01

    To each hyperbolic Landau level of the Poincaré disc is attached a generalized negative binomial distribution. In this paper, we compute the moment generating function of this distribution and supply its atomic decomposition as a perturbation of the negative binomial distribution by a finitely supported measure. Using the Mandel parameter, we also discuss the nonclassical nature of the associated coherent states. Next, we derive a Lévy-Khintchine-type representation of its characteristic function when the latter does not vanish and deduce that it is quasi-infinitely divisible except for the lowest hyperbolic Landau level corresponding to the negative binomial distribution. By considering the total variation of the obtained quasi-Lévy measure, we introduce a new infinitely divisible distribution for which we derive the characteristic function.

  9. Analysis of generalized negative binomial distributions attached to hyperbolic Landau levels

    Energy Technology Data Exchange (ETDEWEB)

    Chhaiba, Hassan, E-mail: chhaiba.hassan@gmail.com [Department of Mathematics, Faculty of Sciences, Ibn Tofail University, P.O. Box 133, Kénitra (Morocco); Demni, Nizar, E-mail: nizar.demni@univ-rennes1.fr [IRMAR, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex (France); Mouayn, Zouhair, E-mail: mouayn@fstbm.ac.ma [Department of Mathematics, Faculty of Sciences and Technics (M’Ghila), Sultan Moulay Slimane, P.O. Box 523, Béni Mellal (Morocco)

    2016-07-15

    To each hyperbolic Landau level of the Poincaré disc is attached a generalized negative binomial distribution. In this paper, we compute the moment generating function of this distribution and supply its atomic decomposition as a perturbation of the negative binomial distribution by a finitely supported measure. Using the Mandel parameter, we also discuss the nonclassical nature of the associated coherent states. Next, we derive a Lévy-Khintchine-type representation of its characteristic function when the latter does not vanish and deduce that it is quasi-infinitely divisible except for the lowest hyperbolic Landau level corresponding to the negative binomial distribution. By considering the total variation of the obtained quasi-Lévy measure, we introduce a new infinitely divisible distribution for which we derive the characteristic function.

  10. Relacionando las distribuciones binomial negativa y logarítmica vía sus series asociadas

    OpenAIRE

    Sadinle, Mauricio

    2011-01-01

    La distribución binomial negativa está asociada a la serie obtenida de derivar la serie logarítmica. Recíprocamente, la distribución logarítmica está asociada a la serie obtenida de integrar la serie asociada a la distribución binomial negativa. El parámetro del número de fallas de la distribución Binomial negativa es el número de derivadas necesarias para obtener la serie binomial negativa de la serie logarítmica. El razonamiento presentado puede emplearse como un método alternativo para pro...

  11. Modelling the Frequency of Operational Risk Losses under the Basel II Capital Accord: A Comparative study of Poisson and Negative Binomial Distributions

    OpenAIRE

    Silver, Toni O.

    2013-01-01

    2013 dissertation for MSc in Finance and Risk Management. Selected by academic staff as a good example of a masters level dissertation. \\ud \\ud This study investigated the two major methods of modelling the frequency of\\ud operational losses under the BCBS Accord of 1998 known as Basel II Capital\\ud Accord. It compared the Poisson method of modelling the frequency of\\ud losses to that of the Negative Binomial. The frequency of operational losses\\ud was investigated using a cross section of se...

  12. Metaprop: a Stata command to perform meta-analysis of binomial data.

    Science.gov (United States)

    Nyaga, Victoria N; Arbyn, Marc; Aerts, Marc

    2014-01-01

    Meta-analyses have become an essential tool in synthesizing evidence on clinical and epidemiological questions derived from a multitude of similar studies assessing the particular issue. Appropriate and accessible statistical software is needed to produce the summary statistic of interest. Metaprop is a statistical program implemented to perform meta-analyses of proportions in Stata. It builds further on the existing Stata procedure metan which is typically used to pool effects (risk ratios, odds ratios, differences of risks or means) but which is also used to pool proportions. Metaprop implements procedures which are specific to binomial data and allows computation of exact binomial and score test-based confidence intervals. It provides appropriate methods for dealing with proportions close to or at the margins where the normal approximation procedures often break down, by use of the binomial distribution to model the within-study variability or by allowing Freeman-Tukey double arcsine transformation to stabilize the variances. Metaprop was applied on two published meta-analyses: 1) prevalence of HPV-infection in women with a Pap smear showing ASC-US; 2) cure rate after treatment for cervical precancer using cold coagulation. The first meta-analysis showed a pooled HPV-prevalence of 43% (95% CI: 38%-48%). In the second meta-analysis, the pooled percentage of cured women was 94% (95% CI: 86%-97%). By using metaprop, no studies with 0% or 100% proportions were excluded from the meta-analysis. Furthermore, study specific and pooled confidence intervals always were within admissible values, contrary to the original publication, where metan was used.

  13. Statistical Inference for a Class of Multivariate Negative Binomial Distributions

    DEFF Research Database (Denmark)

    Rubak, Ege H.; Møller, Jesper; McCullagh, Peter

    This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...

  14. Variable selection and model choice in geoadditive regression models.

    Science.gov (United States)

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  15. Typification of the binomial Sedum boissierianum Hausskncht Crassulaceae

    Directory of Open Access Journals (Sweden)

    Valentin Bârcă

    2016-06-01

    Full Text Available Genus Sedum, described by Linne in 1753, a polyphyletic genus of flowering plants in the family Crassulaceae, was extensively studied both by scholars and succulents enthusiasts. Despite the great attention it has received over time, many binomials lack types and consistent taxonomic treatments. My currently undergoing process of the comprehensive revision of the genus Sedum in România and the Balkans called for the clarification of taxonomical status and typification of several basionyms described based on plants collected from the Balkans and the region around the Black Sea. Almost a century and a half ago, Haussknecht intensely studied flora of Near East and assigned several binomials to plants of the genus Sedum, some of which were neglected and forgotten although they represent interesting taxa. The binomial Sedum boissierianum Haussknecht first appeared in schedae as nomen nudum around 1892, for specimens originating from Near East, but was later validated by Froederstroem in 1932. Following extensive revision of the relevant literature and herbarium specimens, I located original material for this taxon in several herbaria in Europe (BUCF, LD, P, Z. I hereby designate the lectotype and isolectotypes for this basionym and furthermore provide pictures of the herbarium specimens and pinpoint on a map the original collections sites for the designated types.

  16. The MIDAS Touch: Mixed Data Sampling Regression Models

    OpenAIRE

    Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen

    2004-01-01

    We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies. Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies. We examine the asymptotic properties of MIDAS regression estimation and compare it with traditional distributed lag models. MIDAS regressions have wide applicability in macroeconomics and �nance.

  17. Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions

    Directory of Open Access Journals (Sweden)

    Quentin Noirhomme

    2014-01-01

    Full Text Available Multivariate classification is used in neuroimaging studies to infer brain activation or in medical applications to infer diagnosis. Their results are often assessed through either a binomial or a permutation test. Here, we simulated classification results of generated random data to assess the influence of the cross-validation scheme on the significance of results. Distributions built from classification of random data with cross-validation did not follow the binomial distribution. The binomial test is therefore not adapted. On the contrary, the permutation test was unaffected by the cross-validation scheme. The influence of the cross-validation was further illustrated on real-data from a brain–computer interface experiment in patients with disorders of consciousness and from an fMRI study on patients with Parkinson disease. Three out of 16 patients with disorders of consciousness had significant accuracy on binomial testing, but only one showed significant accuracy using permutation testing. In the fMRI experiment, the mental imagery of gait could discriminate significantly between idiopathic Parkinson's disease patients and healthy subjects according to the permutation test but not according to the binomial test. Hence, binomial testing could lead to biased estimation of significance and false positive or negative results. In our view, permutation testing is thus recommended for clinical application of classification with cross-validation.

  18. Comparison of multiplicity distributions to the negative binomial distribution in muon-proton scattering

    International Nuclear Information System (INIS)

    Arneodo, M.; Ferrero, M.I.; Peroni, C.; Bee, C.P.; Bird, I.; Coughlan, J.; Sloan, T.; Braun, H.; Brueck, H.; Drees, J.; Edwards, A.; Krueger, J.; Montgomery, H.E.; Peschel, H.; Pietrzyk, U.; Poetsch, M.; Schneider, A.; Dreyer, T.; Ernst, T.; Haas, J.; Kabuss, E.M.; Landgraf, U.; Mohr, W.; Rith, K.; Schlagboehmer, A.; Schroeder, T.; Stier, H.E.; Wallucks, W.

    1987-01-01

    The multiplicity distributions of charged hadrons produced in the deep inelastic muon-proton scattering at 280 GeV are analysed in various rapidity intervals, as a function of the total hadronic centre of mass energy W ranging from 4-20 GeV. Multiplicity distributions for the backward and forward hemispheres are also analysed separately. The data can be well parameterized by binomial distributions, extending their range of applicability to the case of lepton-proton scattering. The energy and the rapidity dependence of the parameters is presented and a smooth transition from the binomial distribution via Poissonian to the ordinary binomial is observed. (orig.)

  19. Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data

    Science.gov (United States)

    Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.

    2014-01-01

    In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438

  20. Adaptive estimation of binomial probabilities under misclassification

    NARCIS (Netherlands)

    Albers, Willem/Wim; Veldman, H.J.

    1984-01-01

    If misclassification occurs the standard binomial estimator is usually seriously biased. It is known that an improvement can be achieved by using more than one observer in classifying the sample elements. Here it will be investigated which number of observers is optimal given the total number of

  1. Estimating the burden of malaria in Senegal: Bayesian zero-inflated binomial geostatistical modeling of the MIS 2008 data.

    Directory of Open Access Journals (Sweden)

    Federica Giardina

    Full Text Available The Research Center for Human Development in Dakar (CRDH with the technical assistance of ICF Macro and the National Malaria Control Programme (NMCP conducted in 2008/2009 the Senegal Malaria Indicator Survey (SMIS, the first nationally representative household survey collecting parasitological data and malaria-related indicators. In this paper, we present spatially explicit parasitaemia risk estimates and number of infected children below 5 years. Geostatistical Zero-Inflated Binomial models (ZIB were developed to take into account the large number of zero-prevalence survey locations (70% in the data. Bayesian variable selection methods were incorporated within a geostatistical framework in order to choose the best set of environmental and climatic covariates associated with the parasitaemia risk. Model validation confirmed that the ZIB model had a better predictive ability than the standard Binomial analogue. Markov chain Monte Carlo (MCMC methods were used for inference. Several insecticide treated nets (ITN coverage indicators were calculated to assess the effectiveness of interventions. After adjusting for climatic and socio-economic factors, the presence of at least one ITN per every two household members and living in urban areas reduced the odds of parasitaemia by 86% and 81% respectively. Posterior estimates of the ORs related to the wealth index show a decreasing trend with the quintiles. Infection odds appear to be increasing with age. The population-adjusted prevalence ranges from 0.12% in Thillé-Boubacar to 13.1% in Dabo. Tambacounda has the highest population-adjusted predicted prevalence (8.08% whereas the region with the highest estimated number of infected children under the age of 5 years is Kolda (13940. The contemporary map and estimates of malaria burden identify the priority areas for future control interventions and provide baseline information for monitoring and evaluation. Zero-Inflated formulations are more appropriate

  2. Binomial and enumerative sampling of Tetranychus urticae (Acari: Tetranychidae) on peppermint in California.

    Science.gov (United States)

    Tollerup, Kris E; Marcum, Daniel; Wilson, Rob; Godfrey, Larry

    2013-08-01

    The two-spotted spider mite, Tetranychus urticae Koch, is an economic pest on peppermint [Mentha x piperita (L.), 'Black Mitcham'] grown in California. A sampling plan for T. urticae was developed under Pacific Northwest conditions in the early 1980s and has been used by California growers since approximately 1998. This sampling plan, however, is cumbersome and a poor predictor of T. urticae densities in California. Between June and August, the numbers of immature and adult T. urticae were counted on leaves at three commercial peppermint fields (sites) in 2010 and a single field in 2011. In each of seven locations per site, 45 leaves were sampled, that is, 9 leaves per five stems. Leaf samples were stratified by collecting three leaves from the top, middle, and bottom strata per stem. The on-plant distribution of T. urticae did not significantly differ among the stem strata through the growing season. Binomial and enumerative sampling plans were developed using generic Taylor's power law coefficient values. The best fit of our data for binomial sampling occurred using a tally threshold of T = 0. The optimum number of leaves required for T urticae at the critical density of five mites per leaf was 20 for the binomial and 23 for the enumerative sampling plans, respectively. Sampling models were validated using Resampling for Validation of Sampling Plan Software.

  3. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  4. Real estate value prediction using multivariate regression models

    Science.gov (United States)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  5. Improved binomial charts for high-quality processes

    NARCIS (Netherlands)

    Albers, Willem/Wim

    For processes concerning attribute data with (very) small failure rate p, often negative binomial control charts are used. The decision whether to stop or continue is made each time r failures have occurred, for some r≥1. Finding the optimal r for detecting a given increase of p first requires

  6. Constructing Binomial Trees Via Random Maps for Analysis of Financial Assets

    Directory of Open Access Journals (Sweden)

    Antonio Airton Carneiro de Freitas

    2010-04-01

    Full Text Available Random maps can be constructed from a priori knowledge of the financial assets. It is also addressed the reverse problem, i.e. from a function of an empirical stationary probability density function we set up a random map that naturally leads to an implied binomial tree, allowing the adjustment of models, including the ability to incorporate jumps. An applica- tion related to the options market is presented. It is emphasized that the quality of the model to incorporate a priori knowledge of the financial asset may be affected, for example, by the skewed vision of the analyst.

  7. Binomial tree method for pricing a regime-switching volatility stock loans

    Science.gov (United States)

    Putri, Endah R. M.; Zamani, Muhammad S.; Utomo, Daryono B.

    2018-03-01

    Binomial model with regime switching may represents the price of stock loan which follows the stochastic process. Stock loan is one of alternative that appeal investors to get the liquidity without selling the stock. The stock loan mechanism resembles that of American call option when someone can exercise any time during the contract period. From the resembles both of mechanism, determination price of stock loan can be interpreted from the model of American call option. The simulation result shows the behavior of the price of stock loan under a regime-switching with respect to various interest rate and maturity.

  8. Interpretations and implications of negative binomial distributions of multiparticle productions

    International Nuclear Information System (INIS)

    Arisawa, Tetsuo

    2006-01-01

    The number of particles produced in high energy experiments is approximated by a negative binomial distribution. Deriving a representation of the distribution from a stochastic equation, conditions for the process to satisfy the distribution are clarified. Based on them, it is proposed that multiparticle production consists of spontaneous and induced production. The rate of the induced production is proportional to the number of existing particles. The ratio of the two production rates remains constant during the process. The ''NBD space'' is also defined where the number of particles produced in its subspaces follows negative binomial distributions with different parameters

  9. Factores determinantes de la utilización de instrumentos públicos para la gestión del riesgo en la industria vitivinícola chilena: un modelo logit binomial

    Directory of Open Access Journals (Sweden)

    Germán Lobos

    2008-12-01

    Full Text Available The main objective of this research was to identify the determining factors of the use of public instruments to manage risk in the Chilean wine industry. A binomial logistic regression model was proposed. Based on a survey of 104 viticulture and winemaking companies, a database was constructed between January and October 2007. The model was fitted using maximum likelihood estimation. The variables that turned out to be statistically significant were: risk of the wine price, availability of external consultancy and number of permanent workers. From the Public Management point of view, the main conclusion suggests that the use of public instruments could be increased if viticulturists and winemakers had more external counseling.

  10. Variable importance in latent variable regression models

    NARCIS (Netherlands)

    Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.

    2014-01-01

    The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable

  11. Pythagoras, Binomial, and de Moivre Revisited Through Differential Equations

    OpenAIRE

    Singh, Jitender; Bajaj, Renu

    2018-01-01

    The classical Pythagoras theorem, binomial theorem, de Moivre's formula, and numerous other deductions are made using the uniqueness theorem for the initial value problems in linear ordinary differential equations.

  12. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  13. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  14. Regression Models For Multivariate Count Data.

    Science.gov (United States)

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  15. Parameter estimation of the zero inflated negative binomial beta exponential distribution

    Science.gov (United States)

    Sirichantra, Chutima; Bodhisuwan, Winai

    2017-11-01

    The zero inflated negative binomial-beta exponential (ZINB-BE) distribution is developed, it is an alternative distribution for the excessive zero counts with overdispersion. The ZINB-BE distribution is a mixture of two distributions which are Bernoulli and negative binomial-beta exponential distributions. In this work, some characteristics of the proposed distribution are presented, such as, mean and variance. The maximum likelihood estimation is applied to parameter estimation of the proposed distribution. Finally some results of Monte Carlo simulation study, it seems to have high-efficiency when the sample size is large.

  16. Some normed binomial difference sequence spaces related to the [Formula: see text] spaces.

    Science.gov (United States)

    Song, Meimei; Meng, Jian

    2017-01-01

    The aim of this paper is to introduce the normed binomial sequence spaces [Formula: see text] by combining the binomial transformation and difference operator, where [Formula: see text]. We prove that these spaces are linearly isomorphic to the spaces [Formula: see text] and [Formula: see text], respectively. Furthermore, we compute Schauder bases and the α -, β - and γ -duals of these sequence spaces.

  17. Possibility and Challenges of Conversion of Current Virus Species Names to Linnaean Binomials

    Energy Technology Data Exchange (ETDEWEB)

    Postler, Thomas S.; Clawson, Anna N.; Amarasinghe, Gaya K.; Basler, Christopher F.; Bavari, Sbina; Benkő, Mária; Blasdell, Kim R.; Briese, Thomas; Buchmeier, Michael J.; Bukreyev, Alexander; Calisher, Charles H.; Chandran, Kartik; Charrel, Rémi; Clegg, Christopher S.; Collins, Peter L.; Juan Carlos, De La Torre; Derisi, Joseph L.; Dietzgen, Ralf G.; Dolnik, Olga; Dürrwald, Ralf; Dye, John M.; Easton, Andrew J.; Emonet, Sébastian; Formenty, Pierre; Fouchier, Ron A. M.; Ghedin, Elodie; Gonzalez, Jean-Paul; Harrach, Balázs; Hewson, Roger; Horie, Masayuki; Jiāng, Dàohóng; Kobinger, Gary; Kondo, Hideki; Kropinski, Andrew M.; Krupovic, Mart; Kurath, Gael; Lamb, Robert A.; Leroy, Eric M.; Lukashevich, Igor S.; Maisner, Andrea; Mushegian, Arcady R.; Netesov, Sergey V.; Nowotny, Norbert; Patterson, Jean L.; Payne, Susan L.; PaWeska, Janusz T.; Peters, Clarence J.; Radoshitzky, Sheli R.; Rima, Bertus K.; Romanowski, Victor; Rubbenstroth, Dennis; Sabanadzovic, Sead; Sanfaçon, Hélène; Salvato, Maria S.; Schwemmle, Martin; Smither, Sophie J.; Stenglein, Mark D.; Stone, David M.; Takada, Ayato; Tesh, Robert B.; Tomonaga, Keizo; Tordo, Noël; Towner, Jonathan S.; Vasilakis, Nikos; Volchkov, Viktor E.; Wahl-Jensen, Victoria; Walker, Peter J.; Wang, Lin-Fa; Varsani, Arvind; Whitfield, Anna E.; Zerbini, F. Murilo; Kuhn, Jens H.

    2016-10-22

    Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authors of this article expected when conceiving the experiment. [Arenaviridae; binomials; ICTV; International Committee on Taxonomy of Viruses; Mononegavirales; virus nomenclature; virus taxonomy.

  18. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  19. Revealing Word Order: Using Serial Position in Binomials to Predict Properties of the Speaker

    Science.gov (United States)

    Iliev, Rumen; Smirnova, Anastasia

    2016-01-01

    Three studies test the link between word order in binomials and psychological and demographic characteristics of a speaker. While linguists have already suggested that psychological, cultural and societal factors are important in choosing word order in binomials, the vast majority of relevant research was focused on general factors and on broadly…

  20. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    Science.gov (United States)

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  1. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

    Directory of Open Access Journals (Sweden)

    Odile Sauzet

    2017-07-01

    Full Text Available Abstract Background The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Methods Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. Results The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. Conclusions This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  2. Binomial Distribution Sample Confidence Intervals Estimation 1. Sampling and Medical Key Parameters Calculation

    Directory of Open Access Journals (Sweden)

    Tudor DRUGAN

    2003-08-01

    Full Text Available The aim of the paper was to present the usefulness of the binomial distribution in studying of the contingency tables and the problems of approximation to normality of binomial distribution (the limits, advantages, and disadvantages. The classification of the medical keys parameters reported in medical literature and expressing them using the contingency table units based on their mathematical expressions restrict the discussion of the confidence intervals from 34 parameters to 9 mathematical expressions. The problem of obtaining different information starting with the computed confidence interval for a specified method, information like confidence intervals boundaries, percentages of the experimental errors, the standard deviation of the experimental errors and the deviation relative to significance level was solves through implementation in PHP programming language of original algorithms. The cases of expression, which contain two binomial variables, were separately treated. An original method of computing the confidence interval for the case of two-variable expression was proposed and implemented. The graphical representation of the expression of two binomial variables for which the variation domain of one of the variable depend on the other variable was a real problem because the most of the software used interpolation in graphical representation and the surface maps were quadratic instead of triangular. Based on an original algorithm, a module was implements in PHP in order to represent graphically the triangular surface plots. All the implementation described above was uses in computing the confidence intervals and estimating their performance for binomial distributions sample sizes and variable.

  3. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  4. Predicting Success in Product Development: The Application of Principal Component Analysis to Categorical Data and Binomial Logistic Regression

    Directory of Open Access Journals (Sweden)

    Glauco H.S. Mendes

    2013-09-01

    Full Text Available Critical success factors in new product development (NPD in the Brazilian small and medium enterprises (SMEs are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.

  5. Robust mislabel logistic regression without modeling mislabel probabilities.

    Science.gov (United States)

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  6. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Directory of Open Access Journals (Sweden)

    Eric Ghysels

    2016-08-01

    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  7. Adjusted Wald Confidence Interval for a Difference of Binomial Proportions Based on Paired Data

    Science.gov (United States)

    Bonett, Douglas G.; Price, Robert M.

    2012-01-01

    Adjusted Wald intervals for binomial proportions in one-sample and two-sample designs have been shown to perform about as well as the best available methods. The adjusted Wald intervals are easy to compute and have been incorporated into introductory statistics courses. An adjusted Wald interval for paired binomial proportions is proposed here and…

  8. Jump-and-return sandwiches: A new family of binomial-like selective inversion sequences with improved performance

    Science.gov (United States)

    Brenner, Tom; Chen, Johnny; Stait-Gardner, Tim; Zheng, Gang; Matsukawa, Shingo; Price, William S.

    2018-03-01

    A new family of binomial-like inversion sequences, named jump-and-return sandwiches (JRS), has been developed by inserting a binomial-like sequence into a standard jump-and-return sequence, discovered through use of a stochastic Genetic Algorithm optimisation. Compared to currently used binomial-like inversion sequences (e.g., 3-9-19 and W5), the new sequences afford wider inversion bands and narrower non-inversion bands with an equal number of pulses. As an example, two jump-and-return sandwich 10-pulse sequences achieved 95% inversion at offsets corresponding to 9.4% and 10.3% of the non-inversion band spacing, compared to 14.7% for the binomial-like W5 inversion sequence, i.e., they afforded non-inversion bands about two thirds the width of the W5 non-inversion band.

  9. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  10. Improved binomial charts for monitoring high-quality processes

    NARCIS (Netherlands)

    Albers, Willem/Wim

    2009-01-01

    For processes concerning attribute data with (very) small failure rate p, often negative binomial control charts are used. The decision whether to stop or continue is made each time r failures have occurred, for some r≥1. Finding the optimal r for detecting a given increase of p first requires

  11. Fat suppression in MR imaging with binomial pulse sequences

    International Nuclear Information System (INIS)

    Baudovin, C.J.; Bryant, D.J.; Bydder, G.M.; Young, I.R.

    1989-01-01

    This paper reports on a study to develop pulse sequences allowing suppression of fat signal on MR images without eliminating signal from other tissues with short T1. They have developed such a technique involving selective excitation of protons in water, based on a binomial pulse sequence. Imaging is performed at 0.15 T. Careful shimming is performed to maximize separation of fat and water peaks. A spin-echo 1,500/80 sequence is used, employing 90 degrees pulse with transit frequency optimized for water with null excitation of 20 H offset, followed by a section-selective 180 degrees pulse. With use of the binomial sequence for imagining, reduction in fat signal is seen on images of the pelvis and legs of volunteers. Patient studies show dramatic improvement in visualization of prostatic carcinoma compared with standard sequences

  12. Zero inflated negative binomial-generalized exponential distributionand its applications

    Directory of Open Access Journals (Sweden)

    Sirinapa Aryuyuen

    2014-08-01

    Full Text Available In this paper, we propose a new zero inflated distribution, namely, the zero inflated negative binomial-generalized exponential (ZINB-GE distribution. The new distribution is used for count data with extra zeros and is an alternative for data analysis with over-dispersed count data. Some characteristics of the distribution are given, such as mean, variance, skewness, and kurtosis. Parameter estimation of the ZINB-GE distribution uses maximum likelihood estimation (MLE method. Simulated and observed data are employed to examine this distribution. The results show that the MLE method seems to have high efficiency for large sample sizes. Moreover, the mean square error of parameter estimation is increased when the zero proportion is higher. For the real data sets, this new zero inflated distribution provides a better fit than the zero inflated Poisson and zero inflated negative binomial distributions.

  13. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  14. Testing homogeneity in Weibull-regression models.

    Science.gov (United States)

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  15. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia; Rue, Haavard

    2018-01-01

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite

  16. Detection of epistatic effects with logic regression and a classical linear regression model.

    Science.gov (United States)

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  17. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  18. A Neutrosophic Binomial Factorial Theorem with their Refrains

    Directory of Open Access Journals (Sweden)

    Huda E. Khalid

    2016-12-01

    Full Text Available The Neutrosophic Precalculus and the Neutrosophic Calculus can be developed in many ways, depending on the types of indeterminacy one has and on the method used to deal with such indeterminacy. This article is innovative since the form of neutrosophic binomial factorial theorem was constructed in addition to its refrains.

  19. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  20. NEWTONP - CUMULATIVE BINOMIAL PROGRAMS

    Science.gov (United States)

    Bowerman, P. N.

    1994-01-01

    The cumulative binomial program, NEWTONP, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, NEWTONP, CUMBIN (NPO-17555), and CROSSER (NPO-17557), can be used independently of one another. NEWTONP can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. NEWTONP calculates the probably p required to yield a given system reliability V for a k-out-of-n system. It can also be used to determine the Clopper-Pearson confidence limits (either one-sided or two-sided) for the parameter p of a Bernoulli distribution. NEWTONP can determine Bayesian probability limits for a proportion (if the beta prior has positive integer parameters). It can determine the percentiles of incomplete beta distributions with positive integer parameters. It can also determine the percentiles of F distributions and the midian plotting positions in probability plotting. NEWTONP is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. NEWTONP is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. It also lists the number of iterations of Newton's method required to calculate the answer within the given error. The NEWTONP program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. NEWTONP was developed in 1988.

  1. Determination of finite-difference weights using scaled binomial windows

    KAUST Repository

    Chu, Chunlei; Stoffa, Paul L.

    2012-01-01

    The finite-difference method evaluates a derivative through a weighted summation of function values from neighboring grid nodes. Conventional finite-difference weights can be calculated either from Taylor series expansions or by Lagrange interpolation polynomials. The finite-difference method can be interpreted as a truncated convolutional counterpart of the pseudospectral method in the space domain. For this reason, we also can derive finite-difference operators by truncating the convolution series of the pseudospectral method. Various truncation windows can be employed for this purpose and they result in finite-difference operators with different dispersion properties. We found that there exists two families of scaled binomial windows that can be used to derive conventional finite-difference operators analytically. With a minor change, these scaled binomial windows can also be used to derive optimized finite-difference operators with enhanced dispersion properties. © 2012 Society of Exploration Geophysicists.

  2. Determination of finite-difference weights using scaled binomial windows

    KAUST Repository

    Chu, Chunlei

    2012-05-01

    The finite-difference method evaluates a derivative through a weighted summation of function values from neighboring grid nodes. Conventional finite-difference weights can be calculated either from Taylor series expansions or by Lagrange interpolation polynomials. The finite-difference method can be interpreted as a truncated convolutional counterpart of the pseudospectral method in the space domain. For this reason, we also can derive finite-difference operators by truncating the convolution series of the pseudospectral method. Various truncation windows can be employed for this purpose and they result in finite-difference operators with different dispersion properties. We found that there exists two families of scaled binomial windows that can be used to derive conventional finite-difference operators analytically. With a minor change, these scaled binomial windows can also be used to derive optimized finite-difference operators with enhanced dispersion properties. © 2012 Society of Exploration Geophysicists.

  3. Alternative regression models to assess increase in childhood BMI.

    Science.gov (United States)

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-09-08

    Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  4. Alternative regression models to assess increase in childhood BMI

    OpenAIRE

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-01-01

    Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...

  5. Microbial comparative pan-genomics using binomial mixture models

    DEFF Research Database (Denmark)

    Ussery, David; Snipen, L; Almøy, T

    2009-01-01

    The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter...... approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. RESULTS: We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection...... probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in Buchnera aphidicola to large (around 43000 gene families) in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely...

  6. Thermal Efficiency Degradation Diagnosis Method Using Regression Model

    International Nuclear Information System (INIS)

    Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

    2011-01-01

    This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant

  7. Calculation of generalized secant integral using binomial coefficients

    International Nuclear Information System (INIS)

    Guseinov, I.I.; Mamedov, B.A.

    2004-01-01

    A single series expansion relation is derived for the generalized secant (GS) integral in terms of binomial coefficients, exponential integrals and incomplete gamma functions. The convergence of the series is tested by the concrete cases of parameters. The formulas given in this study for the evaluation of GS integral show good rate of convergence and numerical stability

  8. Random regression models for detection of gene by environment interaction

    Directory of Open Access Journals (Sweden)

    Meuwissen Theo HE

    2007-02-01

    Full Text Available Abstract Two random regression models, where the effect of a putative QTL was regressed on an environmental gradient, are described. The first model estimates the correlation between intercept and slope of the random regression, while the other model restricts this correlation to 1 or -1, which is expected under a bi-allelic QTL model. The random regression models were compared to a model assuming no gene by environment interactions. The comparison was done with regards to the models ability to detect QTL, to position them accurately and to detect possible QTL by environment interactions. A simulation study based on a granddaughter design was conducted, and QTL were assumed, either by assigning an effect independent of the environment or as a linear function of a simulated environmental gradient. It was concluded that the random regression models were suitable for detection of QTL effects, in the presence and absence of interactions with environmental gradients. Fixing the correlation between intercept and slope of the random regression had a positive effect on power when the QTL effects re-ranked between environments.

  9. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  10. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  11. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  12. Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...

  13. Spatial stochastic regression modelling of urban land use

    International Nuclear Information System (INIS)

    Arshad, S H M; Jaafar, J; Abiden, M Z Z; Latif, Z A; Rasam, A R A

    2014-01-01

    Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable

  14. Physics constrained nonlinear regression models for time series

    International Nuclear Information System (INIS)

    Majda, Andrew J; Harlim, John

    2013-01-01

    A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)

  15. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models

    Directory of Open Access Journals (Sweden)

    Nataša Šarlija

    2017-01-01

    Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.

  16. Regression Models for Repairable Systems

    Czech Academy of Sciences Publication Activity Database

    Novák, Petr

    2015-01-01

    Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf

  17. Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data

    International Nuclear Information System (INIS)

    Guikema, S.D.; Quiring, S.M.

    2012-01-01

    Infrastructure disaster risk assessment seeks to estimate the probability of a given customer or area losing service during a disaster, sometimes in conjunction with estimating the duration of each outage. This is often done on the basis of past data about the effects of similar events impacting the same or similar systems. In many situations this past performance data from infrastructure systems is zero-inflated; it has more zeros than can be appropriately modeled with standard probability distributions. The data are also often non-linear and exhibit threshold effects due to the complexities of infrastructure system performance. Standard zero-inflated statistical models such as zero-inflated Poisson and zero-inflated negative binomial regression models do not adequately capture these complexities. In this paper we develop a novel method that is a hybrid classification tree/regression method for complex, zero-inflated data sets. We investigate its predictive accuracy based on a large number of simulated data sets and then demonstrate its practical usefulness with an application to hurricane power outage risk assessment for a large utility based on actual data from the utility. While formulated for infrastructure disaster risk assessment, this method is promising for data-driven analysis for other situations with zero-inflated, complex data exhibiting response thresholds.

  18. Binomial confidence intervals for testing non-inferiority or superiority: a practitioner's dilemma.

    Science.gov (United States)

    Pradhan, Vivek; Evans, John C; Banerjee, Tathagata

    2016-08-01

    In testing for non-inferiority or superiority in a single arm study, the confidence interval of a single binomial proportion is frequently used. A number of such intervals are proposed in the literature and implemented in standard software packages. Unfortunately, use of different intervals leads to conflicting conclusions. Practitioners thus face a serious dilemma in deciding which one to depend on. Is there a way to resolve this dilemma? We address this question by investigating the performances of ten commonly used intervals of a single binomial proportion, in the light of two criteria, viz., coverage and expected length of the interval. © The Author(s) 2013.

  19. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  20. On a Robust MaxEnt Process Regression Model with Sample-Selection

    Directory of Open Access Journals (Sweden)

    Hea-Jung Kim

    2018-04-01

    Full Text Available In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.

  1. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  2. Difference of Sums Containing Products of Binomial Coefficients and Their Logarithms

    National Research Council Canada - National Science Library

    Miller, Allen R; Moskowitz, Ira S

    2005-01-01

    Properties of the difference of two sums containing products of binomial coefficients and their logarithms which arise in the application of Shannon's information theory to a certain class of covert channels are deduced...

  3. Difference of Sums Containing Products of Binomial Coefficients and their Logarithms

    National Research Council Canada - National Science Library

    Miller, Allen R; Moskowitz, Ira S

    2004-01-01

    Properties of the difference of two sums containing products of binomial coefficients and their logarithms which arise in the application of Shannon's information theory to a certain class of covert channels are deduced...

  4. Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2017-01-01

    In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...

  5. Use of a negative binomial distribution to describe the presence of Sphyrion laevigatum in Genypterus blacodes

    Directory of Open Access Journals (Sweden)

    Patricio Peña-Rehbein

    Full Text Available This paper describes the frequency and number of Sphyrion laevigatum in the skin of Genypterus blacodes, an important economic resource in Chile. The analysis of a spatial distribution model indicated that the parasites tended to cluster. Variations in the number of parasites per host could be described by a negative binomial distribution. The maximum number of parasites observed per host was two.

  6. Short-term electricity prices forecasting based on support vector regression and Auto-regressive integrated moving average modeling

    International Nuclear Information System (INIS)

    Che Jinxing; Wang Jianzhou

    2010-01-01

    In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.

  7. Modeling oil production based on symbolic regression

    International Nuclear Information System (INIS)

    Yang, Guangfei; Li, Xianneng; Wang, Jianliang; Lian, Lian; Ma, Tieju

    2015-01-01

    Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak

  8. Generation of the reciprocal-binomial state for optical fields

    International Nuclear Information System (INIS)

    Valverde, C.; Avelar, A.T.; Baseia, B.; Malbouisson, J.M.C.

    2003-01-01

    We compare the efficiencies of two interesting schemes to generate truncated states of the light field in running modes, namely the 'quantum scissors' and the 'beam-splitter array' schemes. The latter is applied to create the reciprocal-binomial state as a travelling wave, required to implement recent experimental proposals of phase-distribution determination and of quantum lithography

  9. A binomial modeling approach for upscaling colloid transport under unfavorable conditions: organic prediction of extended tailing

    Science.gov (United States)

    Hilpert, Markus; Rasmuson, Anna; Johnson, William

    2017-04-01

    Transport of colloids in saturated porous media is significantly influenced by colloidal interactions with grain surfaces. Near-surface fluid domain colloids experience relatively low fluid drag and relatively strong colloidal forces that slow their down-gradient translation relative to colloids in bulk fluid. Near surface fluid domain colloids may re-enter into the bulk fluid via diffusion (nanoparticles) or expulsion at rear flow stagnation zones, they may immobilize (attach) via strong primary minimum interactions, or they may move along a grain-to-grain contact to the near surface fluid domain of an adjacent grain. We introduce a simple model that accounts for all possible permutations of mass transfer within a dual pore and grain network. The primary phenomena thereby represented in the model are mass transfer of colloids between the bulk and near-surface fluid domains and immobilization onto grain surfaces. Colloid movement is described by a sequence of trials in a series of unit cells, and the binomial distribution is used to calculate the probabilities of each possible sequence. Pore-scale simulations provide mechanistically-determined likelihoods and timescales associated with the above pore-scale colloid mass transfer processes, whereas the network-scale model employs pore and grain topology to determine probabilities of transfer from up-gradient bulk and near-surface fluid domains to down-gradient bulk and near-surface fluid domains. Inter-grain transport of colloids in the near surface fluid domain can cause extended tailing.

  10. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  11. A Bayesian non-inferiority test for two independent binomial proportions.

    Science.gov (United States)

    Kawasaki, Yohei; Miyaoka, Etsuo

    2013-01-01

    In drug development, non-inferiority tests are often employed to determine the difference between two independent binomial proportions. Many test statistics for non-inferiority are based on the frequentist framework. However, research on non-inferiority in the Bayesian framework is limited. In this paper, we suggest a new Bayesian index τ = P(π₁  > π₂-Δ₀|X₁, X₂), where X₁ and X₂ denote binomial random variables for trials n1 and n₂, and parameters π₁ and π₂ , respectively, and the non-inferiority margin is Δ₀> 0. We show two calculation methods for τ, an approximate method that uses normal approximation and an exact method that uses an exact posterior PDF. We compare the approximate probability with the exact probability for τ. Finally, we present the results of actual clinical trials to show the utility of index τ. Copyright © 2013 John Wiley & Sons, Ltd.

  12. Model building strategy for logistic regression: purposeful selection.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  13. The APT model as reduced-rank regression

    NARCIS (Netherlands)

    Bekker, P.A.; Dobbelstein, P.; Wansbeek, T.J.

    Integrating the two steps of an arbitrage pricing theory (APT) model leads to a reduced-rank regression (RRR) model. So the results on RRR can be used to estimate APT models, making estimation very simple. We give a succinct derivation of estimation of RRR, derive the asymptotic variance of RRR

  14. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  15. Influence diagnostics in meta-regression model.

    Science.gov (United States)

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  16. Stability of a Model Explaining Selected Extramusical Influences on Solo and Small-Ensemble Festival Ratings

    Science.gov (United States)

    Bergee, Martin J.; Westfall, Claude R.

    2005-01-01

    This is the third study in a line of inquiry whose purpose has been to develop a theoretical model of selected extra musical variables' influence on solo and small-ensemble festival ratings. Authors of the second of these (Bergee & McWhirter, 2005) had used binomial logistic regression as the basis for their model-formulation strategy. Their…

  17. Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits

    National Research Council Canada - National Science Library

    Gravier, Michael

    1999-01-01

    .... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...

  18. Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

    Science.gov (United States)

    Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu

    2013-01-04

    Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .

  19. AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Н. Білак

    2012-04-01

    Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.

  20. A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data.

    Science.gov (United States)

    Shirazi, Mohammadali; Lord, Dominique; Dhavala, Soma Sekhar; Geedipally, Srinivas Reddy

    2016-06-01

    Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Variable Selection for Regression Models of Percentile Flows

    Science.gov (United States)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high

  2. Prediction of province-level outbreaks of foot-and-mouth disease in Iran using a zero-inflated negative binomial model.

    Science.gov (United States)

    Jafarzadeh, S Reza; Norris, Michelle; Thurmond, Mark C

    2014-08-01

    To identify events that could predict province-level frequency of foot-and-mouth disease (FMD) outbreaks in Iran, 5707 outbreaks reported from April 1995 to March 2002 were studied. A zero-inflated negative binomial model was used to estimate the probability of a 'no-outbreak' status and the number of outbreaks in a province, using the number of previous occurrences of FMD for the same or adjacent provinces and season as covariates. For each province, the probability of observing no outbreak was negatively associated with the number of outbreaks in the same province in the previous month (odds ratio [OR]=0.06, 95% confidence interval [CI]: 0.01, 0.30) and in 'the second previous month' (OR=0.10, 95% CI: 0.02, 0.51), the total number of outbreaks in the second previous month in adjacent provinces (OR=0.57, 95% CI: 0.36, 0.91) and the season (winter [OR=0.18, 95% CI: 0.06, 0.55] and spring [OR=0.27, 95% CI: 0.09, 0.81], compared with summer). The expected number of outbreaks in a province was positively associated with number of outbreaks in the same province in previous month (coefficient [coef]=0.74, 95% CI: 0.66, 0.82) and in the second previous month (coef=0.23, 95% CI: 0.16, 0.31), total number of outbreaks in adjacent provinces in the previous month (coef=0.32, 95% CI: 0.22, 0.41) and season (fall [coef=0.20, 95% CI: 0.07, 0.33] and spring [coef=0.18, 95% CI: 0.05, 0.31], compared to summer); however, number of outbreaks was negatively associated with the total number of outbreaks in adjacent provinces in the second previous month (coef=-0.19, 95% CI: -0.28, -0.09). The findings indicate that the probability of an outbreak (and the expected number of outbreaks if any) may be predicted based on previous province information, which could help decision-makers allocate resources more efficiently for province-level disease control measures. Further, the study illustrates use of zero inflated negative binomial model to study diseases occurrence where disease is

  3. An examination of sources of sensitivity of consumer surplus estimates in travel cost models.

    Science.gov (United States)

    Blaine, Thomas W; Lichtkoppler, Frank R; Bader, Timothy J; Hartman, Travis J; Lucente, Joseph E

    2015-03-15

    We examine sensitivity of estimates of recreation demand using the Travel Cost Method (TCM) to four factors. Three of the four have been routinely and widely discussed in the TCM literature: a) Poisson verses negative binomial regression; b) application of Englin correction to account for endogenous stratification; c) truncation of the data set to eliminate outliers. A fourth issue we address has not been widely modeled: the potential effect on recreation demand of the interaction between income and travel cost. We provide a straightforward comparison of all four factors, analyzing the impact of each on regression parameters and consumer surplus estimates. Truncation has a modest effect on estimates obtained from the Poisson models but a radical effect on the estimates obtained by way of the negative binomial. Inclusion of an income-travel cost interaction term generally produces a more conservative but not a statistically significantly different estimate of consumer surplus in both Poisson and negative binomial models. It also generates broader confidence intervals. Application of truncation, the Englin correction and the income-travel cost interaction produced the most conservative estimates of consumer surplus and eliminated the statistical difference between the Poisson and the negative binomial. Use of the income-travel cost interaction term reveals that for visitors who face relatively low travel costs, the relationship between income and travel demand is negative, while it is positive for those who face high travel costs. This provides an explanation of the ambiguities on the findings regarding the role of income widely observed in the TCM literature. Our results suggest that policies that reduce access to publicly owned resources inordinately impact local low income recreationists and are contrary to environmental justice. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  5. Modeling maximum daily temperature using a varying coefficient regression model

    Science.gov (United States)

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  6. A binomial modeling approach for upscaling colloid transport under unfavorable conditions: Emergent prediction of extended tailing

    Science.gov (United States)

    Hilpert, Markus; Rasmuson, Anna; Johnson, William P.

    2017-07-01

    Colloid transport in saturated porous media is significantly influenced by colloidal interactions with grain surfaces. Near-surface fluid domain colloids experience relatively low fluid drag and relatively strong colloidal forces that slow their downgradient translation relative to colloids in bulk fluid. Near-surface fluid domain colloids may reenter into the bulk fluid via diffusion (nanoparticles) or expulsion at rear flow stagnation zones, they may immobilize (attach) via primary minimum interactions, or they may move along a grain-to-grain contact to the near-surface fluid domain of an adjacent grain. We introduce a simple model that accounts for all possible permutations of mass transfer within a dual pore and grain network. The primary phenomena thereby represented in the model are mass transfer of colloids between the bulk and near-surface fluid domains and immobilization. Colloid movement is described by a Markov chain, i.e., a sequence of trials in a 1-D network of unit cells, which contain a pore and a grain. Using combinatorial analysis, which utilizes the binomial coefficient, we derive the residence time distribution, i.e., an inventory of the discrete colloid travel times through the network and of their probabilities to occur. To parameterize the network model, we performed mechanistic pore-scale simulations in a single unit cell that determined the likelihoods and timescales associated with the above colloid mass transfer processes. We found that intergrain transport of colloids in the near-surface fluid domain can cause extended tailing, which has traditionally been attributed to hydrodynamic dispersion emanating from flow tortuosity of solute trajectories.

  7. Weighted profile likelihood-based confidence interval for the difference between two proportions with paired binomial data.

    Science.gov (United States)

    Pradhan, Vivek; Saha, Krishna K; Banerjee, Tathagata; Evans, John C

    2014-07-30

    Inference on the difference between two binomial proportions in the paired binomial setting is often an important problem in many biomedical investigations. Tang et al. (2010, Statistics in Medicine) discussed six methods to construct confidence intervals (henceforth, we abbreviate it as CI) for the difference between two proportions in paired binomial setting using method of variance estimates recovery. In this article, we propose weighted profile likelihood-based CIs for the difference between proportions of a paired binomial distribution. However, instead of the usual likelihood, we use weighted likelihood that is essentially making adjustments to the cell frequencies of a 2 × 2 table in the spirit of Agresti and Min (2005, Statistics in Medicine). We then conduct numerical studies to compare the performances of the proposed CIs with that of Tang et al. and Agresti and Min in terms of coverage probabilities and expected lengths. Our numerical study clearly indicates that the weighted profile likelihood-based intervals and Jeffreys interval (cf. Tang et al.) are superior in terms of achieving the nominal level, and in terms of expected lengths, they are competitive. Finally, we illustrate the use of the proposed CIs with real-life examples. Copyright © 2014 John Wiley & Sons, Ltd.

  8. Structured Additive Regression Models: An R Interface to BayesX

    Directory of Open Access Journals (Sweden)

    Nikolaus Umlauf

    2015-02-01

    Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

  9. Multiple regression and beyond an introduction to multiple regression and structural equation modeling

    CERN Document Server

    Keith, Timothy Z

    2014-01-01

    Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.

  10. Linear regression crash prediction models : issues and proposed solutions.

    Science.gov (United States)

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  11. The effect of the negative binomial distribution on the line-width of the micromaser cavity field

    International Nuclear Information System (INIS)

    Kremid, A. M.

    2009-01-01

    The influence of negative binomial distribution (NBD) on the line-width of the negative binomial distribution (NBD) on the line-width of the micromaser is considered. The threshold of the micromaser is shifted towards higher values of the pumping parameter q. Moreover the line-width exhibits sharp dips 'resonances' when the cavity temperature reduces to a very low value. These dips are very clear evidence for the occurrence of the so-called trapping states regime in the micromaser. This statistics prevents the appearance of these trapping states, namely by increasing the negative binomial parameter q these dips wash out and the line-width becomes more broadening. For small values of the parameter q the line-width at large values of q randomly oscillates around its transition line. As q becomes large this oscillatory behavior occurs at rarely values of q. (author)

  12. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  13. Possibility and challenges of conversion of current virus species names to Linnaean binomials

    Science.gov (United States)

    Thomas, Postler; Clawson, Anna N.; Amarasinghe, Gaya K.; Basler, Christopher F.; Bavari, Sina; Benko, Maria; Blasdell, Kim R.; Briese, Thomas; Buchmeier, Michael J.; Bukreyev, Alexander; Calisher, Charles H.; Chandran, Kartik; Charrel, Remi; Clegg, Christopher S.; Collins, Peter L.; De la Torre, Juan Carlos; DeRisi, Joseph L.; Dietzgen, Ralf G.; Dolnik, Olga; Durrwald, Ralf; Dye, John M.; Easton, Andrew J.; Emonet, Sebastian; Formenty, Pierre; Fouchier, Ron A. M.; Ghedin, Elodie; Gonzalez, Jean-Paul; Harrach, Balazs; Hewson, Roger; Horie, Masayuki; Jiang, Daohong; Kobinger, Gary P.; Kondo, Hideki; Kropinski, Andrew; Krupovic, Mart; Kurath, Gael; Lamb, Robert A.; Leroy, Eric M.; Lukashevich, Igor S.; Maisner, Andrea; Mushegian, Arcady; Netesov, Sergey V.; Nowotny, Norbert; Patterson, Jean L.; Payne, Susan L.; Paweska, Janusz T.; Peters, C.J.; Radoshitzky, Sheli; Rima, Bertus K.; Romanowski, Victor; Rubbenstroth, Dennis; Sabanadzovic, Sead; Sanfacon, Helene; Salvato , Maria; Schwemmle, Martin; Smither, Sophie J.; Stenglein, Mark; Stone, D.M.; Takada , Ayato; Tesh, Robert B.; Tomonaga, Keizo; Tordo, N.; Towner, Jonathan S.; Vasilakis, Nikos; Volchkov, Victor E.; Jensen, Victoria; Walker, Peter J.; Wang, Lin-Fa; Varsani, Arvind; Whitfield , Anna E.; Zerbini, Francisco Murilo; Kuhn, Jens H.

    2017-01-01

    Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authors of this article expected when conceiving the experiment.

  14. Time evolution of negative binomial optical field in a diffusion channel

    International Nuclear Information System (INIS)

    Liu Tang-Kun; Wu Pan-Pan; Shan Chuan-Jia; Liu Ji-Bing; Fan Hong-Yi

    2015-01-01

    We find the time evolution law of a negative binomial optical field in a diffusion channel. We reveal that by adjusting the diffusion parameter, the photon number can be controlled. Therefore, the diffusion process can be considered a quantum controlling scheme through photon addition. (paper)

  15. Adaptive multiresolution Hermite-Binomial filters for image edge and texture analysis

    NARCIS (Netherlands)

    Gu, Y.H.; Katsaggelos, A.K.

    1994-01-01

    A new multiresolution image analysis approach using adaptive Hermite-Binomial filters is presented in this paper. According to the local image structural and textural properties, the analysis filter kernels are made adaptive both in their scales and orders. Applications of such an adaptive filtering

  16. The art of regression modeling in road safety

    CERN Document Server

    Hauer, Ezra

    2015-01-01

    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  17. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Hong-Juan Li

    2013-04-01

    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  18. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    Science.gov (United States)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  19. Negative binomial distribution for multiplicity distributions in e/sup +/e/sup -/ annihilation

    International Nuclear Information System (INIS)

    Chew, C.K.; Lim, Y.K.

    1986-01-01

    The authors show that the negative binomial distribution fits excellently the available charged-particle multiplicity distributions of e/sup +/e/sup -/ annihilation into hadrons at three different energies √s = 14, 22 and 34 GeV

  20. Fermat’s Little Theorem via Divisibility of Newton’s Binomial

    Directory of Open Access Journals (Sweden)

    Ziobro Rafał

    2015-09-01

    Full Text Available Solving equations in integers is an important part of the number theory [29]. In many cases it can be conducted by the factorization of equation’s elements, such as the Newton’s binomial. The article introduces several simple formulas, which may facilitate this process. Some of them are taken from relevant books [28], [14].

  1. Flexible competing risks regression modeling and goodness-of-fit

    DEFF Research Database (Denmark)

    Scheike, Thomas; Zhang, Mei-Jie

    2008-01-01

    In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...... of the flexible regression models to analyze competing risks data when non-proportionality is present in the data....

  2. Efficient and robust estimation for longitudinal mixed models for binary data

    DEFF Research Database (Denmark)

    Holst, René

    2009-01-01

    This paper proposes a longitudinal mixed model for binary data. The model extends the classical Poisson trick, in which a binomial regression is fitted by switching to a Poisson framework. A recent estimating equations method for generalized linear longitudinal mixed models, called GEEP, is used...... as a vehicle for fitting the conditional Poisson regressions, given a latent process of serial correlated Tweedie variables. The regression parameters are estimated using a quasi-score method, whereas the dispersion and correlation parameters are estimated by use of bias-corrected Pearson-type estimating...... equations, using second moments only. Random effects are predicted by BLUPs. The method provides a computationally efficient and robust approach to the estimation of longitudinal clustered binary data and accommodates linear and non-linear models. A simulation study is used for validation and finally...

  3. Learning Binomial Probability Concepts with Simulation, Random Numbers and a Spreadsheet

    Science.gov (United States)

    Rochowicz, John A., Jr.

    2005-01-01

    This paper introduces the reader to the concepts of binomial probability and simulation. A spreadsheet is used to illustrate these concepts. Random number generators are great technological tools for demonstrating the concepts of probability. Ideas of approximation, estimation, and mathematical usefulness provide numerous ways of learning…

  4. Binomial mitotic segregation of MYCN-carrying double minutes in neuroblastoma illustrates the role of randomness in oncogene amplification.

    Directory of Open Access Journals (Sweden)

    Gisela Lundberg

    2008-08-01

    Full Text Available Amplification of the oncogene MYCN in double minutes (DMs is a common finding in neuroblastoma (NB. Because DMs lack centromeric sequences it has been unclear how NB cells retain and amplify extrachromosomal MYCN copies during tumour development.We show that MYCN-carrying DMs in NB cells translocate from the nuclear interior to the periphery of the condensing chromatin at transition from interphase to prophase and are preferentially located adjacent to the telomere repeat sequences of the chromosomes throughout cell division. However, DM segregation was not affected by disruption of the telosome nucleoprotein complex and DMs readily migrated from human to murine chromatin in human/mouse cell hybrids, indicating that they do not bind to specific positional elements in human chromosomes. Scoring DM copy-numbers in ana/telophase cells revealed that DM segregation could be closely approximated by a binomial random distribution. Colony-forming assay demonstrated a strong growth-advantage for NB cells with high DM (MYCN copy-numbers, compared to NB cells with lower copy-numbers. In fact, the overall distribution of DMs in growing NB cell populations could be readily reproduced by a mathematical model assuming binomial segregation at cell division combined with a proliferative advantage for cells with high DM copy-numbers.Binomial segregation at cell division explains the high degree of MYCN copy-number variability in NB. Our findings also provide a proof-of-principle for oncogene amplification through creation of genetic diversity by random events followed by Darwinian selection.

  5. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    Science.gov (United States)

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  6. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    Science.gov (United States)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  7. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  8. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  9. Binomial Coefficients Modulo a Prime--A Visualization Approach to Undergraduate Research

    Science.gov (United States)

    Bardzell, Michael; Poimenidou, Eirini

    2011-01-01

    In this article we present, as a case study, results of undergraduate research involving binomial coefficients modulo a prime "p." We will discuss how undergraduates were involved in the project, even with a minimal mathematical background beforehand. There are two main avenues of exploration described to discover these binomial…

  10. A test for the parameters of multiple linear regression models ...

    African Journals Online (AJOL)

    A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...

  11. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Negative binomial distribution to describe the presence of Trifur tortuosus (Crustacea: Copepoda in Merluccius gayi (Osteichthyes: Gadiformes

    Directory of Open Access Journals (Sweden)

    Giselle Garcia-Sepulveda

    2017-06-01

    Full Text Available This paper describes the frequency and number of Trifur tortuosus in the skin of Merluccius gayi, an important economic resource in Chile. Analysis of a spatial distribution model indicated that the parasites tended to cluster. Variations in the number of parasites per host can be described by a negative binomial distribution. The maximum number of parasites observed per host was one, similar patterns was described for other parasites in Chilean marine fishes.

  13. Using the Binomial Series to Prove the Arithmetic Mean-Geometric Mean Inequality

    Science.gov (United States)

    Persky, Ronald L.

    2003-01-01

    In 1968, Leon Gerber compared (1 + x)[superscript a] to its kth partial sum as a binomial series. His result is stated and, as an application of this result, a proof of the arithmetic mean-geometric mean inequality is presented.

  14. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  15. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...

  16. Multiple Response Regression for Gaussian Mixture Models with Known Labels.

    Science.gov (United States)

    Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng

    2012-12-01

    Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.

  17. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  18. Use of negative binomial distribution to describe the presence of Anisakis in Thyrsites atun.

    Science.gov (United States)

    Peña-Rehbein, Patricio; De los Ríos-Escalante, Patricio

    2012-01-01

    Nematodes of the genus Anisakis have marine fishes as intermediate hosts. One of these hosts is Thyrsites atun, an important fishery resource in Chile between 38 and 41° S. This paper describes the frequency and number of Anisakis nematodes in the internal organs of Thyrsites atun. An analysis based on spatial distribution models showed that the parasites tend to be clustered. The variation in the number of parasites per host could be described by the negative binomial distribution. The maximum observed number of parasites was nine parasites per host. The environmental and zoonotic aspects of the study are also discussed.

  19. Bayesian approach to errors-in-variables in regression models

    Science.gov (United States)

    Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad

    2017-05-01

    In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.

  20. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  1. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  2. Sample size determination for a three-arm equivalence trial of Poisson and negative binomial responses.

    Science.gov (United States)

    Chang, Yu-Wei; Tsong, Yi; Zhao, Zhigen

    2017-01-01

    Assessing equivalence or similarity has drawn much attention recently as many drug products have lost or will lose their patents in the next few years, especially certain best-selling biologics. To claim equivalence between the test treatment and the reference treatment when assay sensitivity is well established from historical data, one has to demonstrate both superiority of the test treatment over placebo and equivalence between the test treatment and the reference treatment. Thus, there is urgency for practitioners to derive a practical way to calculate sample size for a three-arm equivalence trial. The primary endpoints of a clinical trial may not always be continuous, but may be discrete. In this paper, the authors derive power function and discuss sample size requirement for a three-arm equivalence trial with Poisson and negative binomial clinical endpoints. In addition, the authors examine the effect of the dispersion parameter on the power and the sample size by varying its coefficient from small to large. In extensive numerical studies, the authors demonstrate that required sample size heavily depends on the dispersion parameter. Therefore, misusing a Poisson model for negative binomial data may easily lose power up to 20%, depending on the value of the dispersion parameter.

  3. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  4. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  5. Linear regression models for quantitative assessment of left ...

    African Journals Online (AJOL)

    Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...

  6. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  7. Poisson regression for modeling count and frequency outcomes in trauma research.

    Science.gov (United States)

    Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T

    2008-10-01

    The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.

  8. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Directory of Open Access Journals (Sweden)

    Minh Vu Trieu

    2017-03-01

    Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  9. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Science.gov (United States)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  10. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. A computational approach to compare regression modelling strategies in prediction research.

    Science.gov (United States)

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  12. Topology of unitary groups and the prime orders of binomial coefficients

    Science.gov (United States)

    Duan, HaiBao; Lin, XianZu

    2017-09-01

    Let $c:SU(n)\\rightarrow PSU(n)=SU(n)/\\mathbb{Z}_{n}$ be the quotient map of the special unitary group $SU(n)$ by its center subgroup $\\mathbb{Z}_{n}$. We determine the induced homomorphism $c^{\\ast}:$ $H^{\\ast}(PSU(n))\\rightarrow H^{\\ast}(SU(n))$ on cohomologies by computing with the prime orders of binomial coefficients

  13. Raw and Central Moments of Binomial Random Variables via Stirling Numbers

    Science.gov (United States)

    Griffiths, Martin

    2013-01-01

    We consider here the problem of calculating the moments of binomial random variables. It is shown how formulae for both the raw and the central moments of such random variables may be obtained in a recursive manner utilizing Stirling numbers of the first kind. Suggestions are also provided as to how students might be encouraged to explore this…

  14. Entanglement properties between two atoms in the binomial optical field interacting with two entangled atoms

    International Nuclear Information System (INIS)

    Liu Tang-Kun; Zhang Kang-Long; Tao Yu; Shan Chuan-Jia; Liu Ji-Bing

    2016-01-01

    The temporal evolution of the degree of entanglement between two atoms in a system of the binomial optical field interacting with two arbitrary entangled atoms is investigated. The influence of the strength of the dipole–dipole interaction between two atoms, probabilities of the Bernoulli trial, and particle number of the binomial optical field on the temporal evolution of the atomic entanglement are discussed. The result shows that the two atoms are always in the entanglement state. Moreover, if and only if the two atoms are initially in the maximally entangled state, the entanglement evolution is not affected by the parameters, and the degree of entanglement is always kept as 1. (paper)

  15. Forward selection two sample binomial test

    Science.gov (United States)

    Wong, Kam-Fai; Wong, Weng-Kee; Lin, Miao-Shan

    2016-01-01

    Fisher’s exact test (FET) is a conditional method that is frequently used to analyze data in a 2 × 2 table for small samples. This test is conservative and attempts have been made to modify the test to make it less conservative. For example, Crans and Shuster (2008) proposed adding more points in the rejection region to make the test more powerful. We provide another way to modify the test to make it less conservative by using two independent binomial distributions as the reference distribution for the test statistic. We compare our new test with several methods and show that our test has advantages over existing methods in terms of control of the type 1 and type 2 errors. We reanalyze results from an oncology trial using our proposed method and our software which is freely available to the reader. PMID:27335577

  16. On bounds in Poisson approximation for distributions of independent negative-binomial distributed random variables.

    Science.gov (United States)

    Hung, Tran Loc; Giang, Le Truong

    2016-01-01

    Using the Stein-Chen method some upper bounds in Poisson approximation for distributions of row-wise triangular arrays of independent negative-binomial distributed random variables are established in this note.

  17. A generalized right truncated bivariate Poisson regression model with applications to health data.

    Science.gov (United States)

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  18. The arcsine is asinine: the analysis of proportions in ecology.

    Science.gov (United States)

    Warton, David I; Hui, Francis K C

    2011-01-01

    The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.

  19. Linear regression metamodeling as a tool to summarize and present simulation model results.

    Science.gov (United States)

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  20. Confidence Intervals for Asbestos Fiber Counts: Approximate Negative Binomial Distribution.

    Science.gov (United States)

    Bartley, David; Slaven, James; Harper, Martin

    2017-03-01

    The negative binomial distribution is adopted for analyzing asbestos fiber counts so as to account for both the sampling errors in capturing only a finite number of fibers and the inevitable human variation in identifying and counting sampled fibers. A simple approximation to this distribution is developed for the derivation of quantiles and approximate confidence limits. The success of the approximation depends critically on the use of Stirling's expansion to sufficient order, on exact normalization of the approximating distribution, on reasonable perturbation of quantities from the normal distribution, and on accurately approximating sums by inverse-trapezoidal integration. Accuracy of the approximation developed is checked through simulation and also by comparison to traditional approximate confidence intervals in the specific case that the negative binomial distribution approaches the Poisson distribution. The resulting statistics are shown to relate directly to early research into the accuracy of asbestos sampling and analysis. Uncertainty in estimating mean asbestos fiber concentrations given only a single count is derived. Decision limits (limits of detection) and detection limits are considered for controlling false-positive and false-negative detection assertions and are compared to traditional limits computed assuming normal distributions. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.

  1. Hierarchical Neural Regression Models for Customer Churn Prediction

    Directory of Open Access Journals (Sweden)

    Golshan Mohammadi

    2013-01-01

    Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.

  2. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  3. Developing and testing a global-scale regression model to quantify mean annual streamflow

    Science.gov (United States)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  4. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  5. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  6. Confidence bands for inverse regression models

    International Nuclear Information System (INIS)

    Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo

    2010-01-01

    We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract

  7. Computation of Clebsch-Gordan and Gaunt coefficients using binomial coefficients

    International Nuclear Information System (INIS)

    Guseinov, I.I.; Oezmen, A.; Atav, Ue

    1995-01-01

    Using binomial coefficients the Clebsch-Gordan and Gaunt coefficients were calculated for extremely large quantum numbers. The main advantage of this approach is directly calculating these coefficients, instead of using recursion relations. Accuracy of the results is quite high for quantum numbers l 1 , and l 2 up to 100. Despite direct calculation, the CPU times are found comparable with those given in the related literature. 11 refs., 1 fig., 2 tabs

  8. Electricity consumption forecasting in Italy using linear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio [DIAM, Seconda Universita degli Studi di Napoli, Via Roma 29, 81031 Aversa (CE) (Italy)

    2009-09-15

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of {+-}1% for the best case and {+-}11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  9. Electricity consumption forecasting in Italy using linear regression models

    International Nuclear Information System (INIS)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio

    2009-01-01

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  10. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  11. Augmented Beta rectangular regression models: A Bayesian perspective.

    Science.gov (United States)

    Wang, Jue; Luo, Sheng

    2016-01-01

    Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Bayesian semiparametric regression models to characterize molecular evolution

    Directory of Open Access Journals (Sweden)

    Datta Saheli

    2012-10-01

    Full Text Available Abstract Background Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments. Results The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties. Conclusions The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.

  13. Regression analysis of a chemical reaction fouling model

    International Nuclear Information System (INIS)

    Vasak, F.; Epstein, N.

    1996-01-01

    A previously reported mathematical model for the initial chemical reaction fouling of a heated tube is critically examined in the light of the experimental data for which it was developed. A regression analysis of the model with respect to that data shows that the reference point upon which the two adjustable parameters of the model were originally based was well chosen, albeit fortuitously. (author). 3 refs., 2 tabs., 2 figs

  14. Correlation-regression model for physico-chemical quality of ...

    African Journals Online (AJOL)

    abusaad

    areas, suggesting that groundwater quality in urban areas is closely related with land use ... the ground water, with correlation and regression model is also presented. ...... WHO (World Health Organization) (1985). Health hazards from nitrates.

  15. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  16. Multivariate Frequency-Severity Regression Models in Insurance

    Directory of Open Access Journals (Sweden)

    Edward W. Frees

    2016-02-01

    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  17. A distribución binomial como ferramenta na resolución de problemas de xenética

    OpenAIRE

    Ron Pedreira, Antonio Miguel de; Martínez, Ana María

    1999-01-01

    La distribución binomial presenta una amplia gama de campos de aplicación debido a que en situaciones cotidianas, se presenta con elevada frecuencia algún tipo de situación basada en dos hechos diferentes, alternativos, excluyentes y con probablilidades que suman 1 (cien por cien), es decir, el hecho cierto. En cuanto a la genética, pueden encontrarse supuestos que permitan la aplicación de la distribución binomial en ámbitos de la genética molecular, mendeliana, cuantitativa y genética de po...

  18. A Binomial Modeling Approach for Upscaling Colloid Transport Under Unfavorable Attachment Conditions: Emergent Prediction of Nonmonotonic Retention Profiles

    Science.gov (United States)

    Hilpert, Markus; Johnson, William P.

    2018-01-01

    We used a recently developed simple mathematical network model to upscale pore-scale colloid transport information determined under unfavorable attachment conditions. Classical log-linear and nonmonotonic retention profiles, both well-reported under favorable and unfavorable attachment conditions, respectively, emerged from our upscaling. The primary attribute of the network is colloid transfer between bulk pore fluid, the near-surface fluid domain (NSFD), and attachment (treated as irreversible). The network model accounts for colloid transfer to the NSFD of downgradient grains and for reentrainment to bulk pore fluid via diffusion or via expulsion at rear flow stagnation zones (RFSZs). The model describes colloid transport by a sequence of random trials in a one-dimensional (1-D) network of Happel cells, which contain a grain and a pore. Using combinatorial analysis that capitalizes on the binomial coefficient, we derived from the pore-scale information the theoretical residence time distribution of colloids in the network. The transition from log-linear to nonmonotonic retention profiles occurs when the conditions underlying classical filtration theory are not fulfilled, i.e., when an NSFD colloid population is maintained. Then, nonmonotonic retention profiles result potentially both for attached and NSFD colloids. The concentration maxima shift downgradient depending on specific parameter choice. The concentration maxima were also shown to shift downgradient temporally (with continued elution) under conditions where attachment is negligible, explaining experimentally observed downgradient transport of retained concentration maxima of adhesion-deficient bacteria. For the case of zero reentrainment, we develop closed-form, analytical expressions for the shape, and the maximum of the colloid retention profile.

  19. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  20. Application of a binomial cusum control chart to monitor one drinking water indicator

    Directory of Open Access Journals (Sweden)

    Elisa Henning

    2014-02-01

    Full Text Available The aim of this study is to analyze the use of a binomial cumulative sum chart (CUSUM to monitor the presence of total coliforms, biological indicators of quality of water supplies in water treatment processes. The sample series were monthly taken from a water treatment plant and were analyzed from 2007 to 2009. The statistical treatment of the data was performed using GNU R, and routines were created for the approximation of the upper limit of the binomial CUSUM chart. Furthermore, a comparative study was conducted to investigate whether there is a significant difference in sensitivity between the use of CUSUM and the traditional Shewhart chart, the most commonly used chart in process monitoring. The results obtained demonstrate that this study was essential for making the right choice in selecting a chart for the statistical analysis of this process.

  1. Application of random regression models to the genetic evaluation ...

    African Journals Online (AJOL)

    The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...

  2. Multitask Quantile Regression under the Transnormal Model.

    Science.gov (United States)

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2016-01-01

    We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.

  3. Use of negative binomial distribution to describe the presence of Anisakis in Thyrsites atun Uso de distribuição binomial negativa para descrever a presença de Anisakis em Thyrsites atun

    Directory of Open Access Journals (Sweden)

    Patricio Peña-Rehbein

    2012-03-01

    Full Text Available Nematodes of the genus Anisakis have marine fishes as intermediate hosts. One of these hosts is Thyrsites atun, an important fishery resource in Chile between 38 and 41° S. This paper describes the frequency and number of Anisakis nematodes in the internal organs of Thyrsites atun. An analysis based on spatial distribution models showed that the parasites tend to be clustered. The variation in the number of parasites per host could be described by the negative binomial distribution. The maximum observed number of parasites was nine parasites per host. The environmental and zoonotic aspects of the study are also discussed.Nematóides do gênero Anisakis têm nos peixes marinhos seus hospedeiros intermediários. Um desses hospedeiros é Thyrsites atun, um importante recurso pesqueiro no Chile entre 38 e 41° S. Este artigo descreve a freqüência e o número de nematóides Anisakis nos órgãos internos de Thyrsites atun. Uma análise baseada em modelos de distribuição espacial demonstrou que os parasitos tendem a ficar agrupados. A variação numérica de parasitas por hospedeiro pôde ser descrita por distribuição binomial negativa. O número máximo observado de parasitas por hospedeiro foi nove. Os aspectos ambientais e zoonóticos desse estudo também serão discutidos.

  4. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  5. What do you do when the binomial cannot value real options? The LSM model

    Directory of Open Access Journals (Sweden)

    S. Alonso

    2014-12-01

    Full Text Available The Least-Squares Monte Carlo model (LSM model has emerged as the derivative valuation technique with the greatest impact in current practice. As with other options valuation models, the LSM algorithm was initially posited in the field of financial derivatives and its extension to the realm of real options requires considering certain questions which might hinder understanding of the algorithm and which the present paper seeks to address. The implementation of the LSM model combines Monte Carlo simulation, dynamic programming and statistical regression in a flexible procedure suitable for application to valuing nearly all types of corporate investments. The goal of this paper is to show how the LSM algorithm is applied in the context of a corporate investment, thus contributing to the understanding of the principles of its operation.

  6. Profile-driven regression for modeling and runtime optimization of mobile networks

    DEFF Research Database (Denmark)

    McClary, Dan; Syrotiuk, Violet; Kulahci, Murat

    2010-01-01

    Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike...

  7. Negative-binomial multiplicity distributions in the interaction of light ions with 12C at 4.2 GeV/c

    International Nuclear Information System (INIS)

    Tucholski, A.; Bogdanowicz, J.; Moroz, Z.; Wojtkowska, J.

    1989-01-01

    Multiplicity distribution of single-charged particles in the interaction of p, d, α and 12 C projectiles with C target nuclei at 4.2 GeV/c were analysed in terms of the negative binomial distribution. The experimental data obtained by the Dubna Propane Bubble Chamber Group were used. It is shown that the experimental distributions are satisfactorily described by the negative-binomial distribution. Values of the parameters of these distributions are discussed. (orig.)

  8. Accounting for measurement error in log regression models with applications to accelerated testing.

    Science.gov (United States)

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  9. Accounting for measurement error in log regression models with applications to accelerated testing.

    Directory of Open Access Journals (Sweden)

    Robert Richardson

    Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  10. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  11. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  12. SPSS macros to compare any two fitted values from a regression model.

    Science.gov (United States)

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  13. Parameter estimation and statistical test of geographically weighted bivariate Poisson inverse Gaussian regression models

    Science.gov (United States)

    Amalia, Junita; Purhadi, Otok, Bambang Widjanarko

    2017-11-01

    Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.

  14. Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?

    Science.gov (United States)

    Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.

    2016-12-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.

  15. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    Science.gov (United States)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  16. Time series modeling by a regression approach based on a latent process.

    Science.gov (United States)

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  17. Generalized harmonic, cyclotomic, and binomial sums, their polylogarithms and special numbers

    Energy Technology Data Exchange (ETDEWEB)

    Ablinger, J.; Schneider, C. [Johannes Kepler Univ., Linz (Austria). Research Inst. for Symbolic Computation (RISC); Bluemlein, J. [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany)

    2013-10-15

    A survey is given on mathematical structures which emerge in multi-loop Feynman diagrams. These are multiply nested sums, and, associated to them by an inverse Mellin transform, specific iterated integrals. Both classes lead to sets of special numbers. Starting with harmonic sums and polylogarithms we discuss recent extensions of these quantities as cyclotomic, generalized (cyclotomic), and binomially weighted sums, associated iterated integrals and special constants and their relations.

  18. Generalized harmonic, cyclotomic, and binomial sums, their polylogarithms and special numbers

    International Nuclear Information System (INIS)

    Ablinger, J.; Schneider, C.; Bluemlein, J.

    2013-10-01

    A survey is given on mathematical structures which emerge in multi-loop Feynman diagrams. These are multiply nested sums, and, associated to them by an inverse Mellin transform, specific iterated integrals. Both classes lead to sets of special numbers. Starting with harmonic sums and polylogarithms we discuss recent extensions of these quantities as cyclotomic, generalized (cyclotomic), and binomially weighted sums, associated iterated integrals and special constants and their relations.

  19. Regression to Causality : Regression-style presentation influences causal attribution

    DEFF Research Database (Denmark)

    Bordacconi, Mats Joe; Larsen, Martin Vinæs

    2014-01-01

    of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...

  20. Validity of the negative binomial multiplicity distribution in case of ultra-relativistic nucleus-nucleus interaction in different azimuthal bins

    International Nuclear Information System (INIS)

    Ghosh, D.; Deb, A.; Haldar, P.K.; Sahoo, S.R.; Maity, D.

    2004-01-01

    This work studies the validity of the negative binomial distribution in the multiplicity distribution of charged secondaries in 16 O and 32 S interactions with AgBr at 60 GeV/c per nucleon and 200 GeV/c per nucleon, respectively. The validity of negative binomial distribution (NBD) is studied in different azimuthal phase spaces. It is observed that the data can be well parameterized in terms of the NBD law for different azimuthal phase spaces. (authors)

  1. Predicting Cumulative Incidence Probability: Marginal and Cause-Specific Modelling

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2005-01-01

    cumulative incidence probability; cause-specific hazards; subdistribution hazard; binomial modelling......cumulative incidence probability; cause-specific hazards; subdistribution hazard; binomial modelling...

  2. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  3. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    Science.gov (United States)

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  4. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  5. Single-vehicle crashes along rural mountainous highways in Malaysia: An application of random parameters negative binomial model.

    Science.gov (United States)

    Rusli, Rusdi; Haque, Md Mazharul; King, Mark; Voon, Wong Shaw

    2017-05-01

    Mountainous highways generally associate with complex driving environment because of constrained road geometries, limited cross-section elements, inappropriate roadside features, and adverse weather conditions. As a result, single-vehicle (SV) crashes are overrepresented along mountainous roads, particularly in developing countries, but little attention is known about the roadway geometric, traffic and weather factors contributing to these SV crashes. As such, the main objective of the present study is to investigate SV crashes using detailed data obtained from a rigorous site survey and existing databases. The final dataset included a total of 56 variables representing road geometries including horizontal and vertical alignment, traffic characteristics, real-time weather condition, cross-sectional elements, roadside features, and spatial characteristics. To account for structured heterogeneities resulting from multiple observations within a site and other unobserved heterogeneities, the study applied a random parameters negative binomial model. Results suggest that rainfall during the crash is positively associated with SV crashes, but real-time visibility is negatively associated. The presence of a road shoulder, particularly a bitumen shoulder or wider shoulders, along mountainous highways is associated with less SV crashes. While speeding along downgrade slopes increases the likelihood of SV crashes, proper delineation decreases the likelihood. Findings of this study have significant implications for designing safer highways in mountainous areas, particularly in the context of a developing country. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  7. Logistic regression for risk factor modelling in stuttering research.

    Science.gov (United States)

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2011-01-01

    In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator

  9. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  10. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  11. Crime Modeling using Spatial Regression Approach

    Science.gov (United States)

    Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.

    2018-01-01

    Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.

  12. Constrained Dynamic Optimality and Binomial Terminal Wealth

    DEFF Research Database (Denmark)

    Pedersen, J. L.; Peskir, G.

    2018-01-01

    with interest rate $r \\in {R}$). Letting $P_{t,x}$ denote a probability measure under which $X^u$ takes value $x$ at time $t,$ we study the dynamic version of the nonlinear optimal control problem $\\inf_u\\, Var{t,X_t^u}(X_T^u)$ where the infimum is taken over admissible controls $u$ subject to $X_t^u \\ge e...... a martingale method combined with Lagrange multipliers, we derive the dynamically optimal control $u_*^d$ in closed form and prove that the dynamically optimal terminal wealth $X_T^d$ can only take two values $g$ and $\\beta$. This binomial nature of the dynamically optimal strategy stands in sharp contrast...... with other known portfolio selection strategies encountered in the literature. A direct comparison shows that the dynamically optimal (time-consistent) strategy outperforms the statically optimal (time-inconsistent) strategy in the problem....

  13. Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

    Science.gov (United States)

    Nikbakht, Roya; Bahrampour, Abbas

    2017-01-01

    Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.

  14. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2009-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  15. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2010-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  16. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  17. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  18. Predictive market segmentation model: An application of logistic regression model and CHAID procedure

    Directory of Open Access Journals (Sweden)

    Soldić-Aleksić Jasna

    2009-01-01

    Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.

  19. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  20. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  1. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  2. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both

  3. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system

    International Nuclear Information System (INIS)

    Fang, Tingting; Lahdelma, Risto

    2016-01-01

    Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption

  4. Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils

    International Nuclear Information System (INIS)

    Sonneveld, M.P.W.; Brus, D.J.; Roelsma, J.

    2010-01-01

    For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region dominated by dairy farming. No data from this area were used for calibrating the regression models. The model was validated by additional probability sampling. This sample was used to estimate errors in 1) the predicted areal fractions where the EU standard of 50 mg l -1 is exceeded for farms with low N surpluses (ALT) and farms with higher N surpluses (REF); 2) predicted cumulative frequency distributions of nitrate concentration for both groups of farms. Both the errors in the predicted areal fractions as well as the errors in the predicted cumulative frequency distributions indicate that the regression models are invalid for the sandy soils of this study area. - This study indicates that linear regression models that predict nitrate concentrations in the upper groundwater using residual soil N contents should be applied with care.

  5. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  6. Modeling and prediction of flotation performance using support vector regression

    Directory of Open Access Journals (Sweden)

    Despotović Vladimir

    2017-01-01

    Full Text Available Continuous efforts have been made in recent year to improve the process of paper recycling, as it is of critical importance for saving the wood, water and energy resources. Flotation deinking is considered to be one of the key methods for separation of ink particles from the cellulose fibres. Attempts to model the flotation deinking process have often resulted in complex models that are difficult to implement and use. In this paper a model for prediction of flotation performance based on Support Vector Regression (SVR, is presented. Representative data samples were created in laboratory, under a variety of practical control variables for the flotation deinking process, including different reagents, pH values and flotation residence time. Predictive model was created that was trained on these data samples, and the flotation performance was assessed showing that Support Vector Regression is a promising method even when dataset used for training the model is limited.

  7. Forecasting daily meteorological time series using ARIMA and regression models

    Science.gov (United States)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  8. Replica analysis of overfitting in regression models for time-to-event data

    Science.gov (United States)

    Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

    2017-09-01

    Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.

  9. Cross-validation pitfalls when selecting and assessing regression and classification models.

    Science.gov (United States)

    Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon

    2014-03-29

    We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

  10. A LATENT CLASS POISSON REGRESSION-MODEL FOR HETEROGENEOUS COUNT DATA

    NARCIS (Netherlands)

    WEDEL, M; DESARBO, WS; BULT, [No Value; RAMASWAMY, [No Value

    1993-01-01

    In this paper an approach is developed that accommodates heterogeneity in Poisson regression models for count data. The model developed assumes that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables. We assume that the mixing

  11. Continuous validation of ASTEC containment models and regression testing

    International Nuclear Information System (INIS)

    Nowack, Holger; Reinke, Nils; Sonnenkalb, Martin

    2014-01-01

    The focus of the ASTEC (Accident Source Term Evaluation Code) development at GRS is primarily on the containment module CPA (Containment Part of ASTEC), whose modelling is to a large extent based on the GRS containment code COCOSYS (COntainment COde SYStem). Validation is usually understood as the approval of the modelling capabilities by calculations of appropriate experiments done by external users different from the code developers. During the development process of ASTEC CPA, bugs and unintended side effects may occur, which leads to changes in the results of the initially conducted validation. Due to the involvement of a considerable number of developers in the coding of ASTEC modules, validation of the code alone, even if executed repeatedly, is not sufficient. Therefore, a regression testing procedure has been implemented in order to ensure that the initially obtained validation results are still valid with succeeding code versions. Within the regression testing procedure, calculations of experiments and plant sequences are performed with the same input deck but applying two different code versions. For every test-case the up-to-date code version is compared to the preceding one on the basis of physical parameters deemed to be characteristic for the test-case under consideration. In the case of post-calculations of experiments also a comparison to experimental data is carried out. Three validation cases from the regression testing procedure are presented within this paper. The very good post-calculation of the HDR E11.1 experiment shows the high quality modelling of thermal-hydraulics in ASTEC CPA. Aerosol behaviour is validated on the BMC VANAM M3 experiment, and the results show also a very good agreement with experimental data. Finally, iodine behaviour is checked in the validation test-case of the THAI IOD-11 experiment. Within this test-case, the comparison of the ASTEC versions V2.0r1 and V2.0r2 shows how an error was detected by the regression testing

  12. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  13. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

    Science.gov (United States)

    Kawashima, Issaku; Kumano, Hiroaki

    2017-01-01

    Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  14. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Issaku Kawashima

    2017-07-01

    Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  15. Reconstruction of missing daily streamflow data using dynamic regression models

    Science.gov (United States)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  16. Nested (inverse) binomial sums and new iterated integrals for massive Feynman diagrams

    International Nuclear Information System (INIS)

    Ablinger, Jakob; Schneider, Carsten; Bluemlein, Johannes; Raab, Clemens G.

    2014-07-01

    Nested sums containing binomial coefficients occur in the computation of massive operatormatrix elements. Their associated iterated integrals lead to alphabets including radicals, for which we determined a suitable basis. We discuss algorithms for converting between sum and integral representations, mainly relying on the Mellin transform. To aid the conversion we worked out dedicated rewrite rules, based on which also some general patterns emerging in the process can be obtained.

  17. Nested (inverse) binomial sums and new iterated integrals for massive Feynman diagrams

    Energy Technology Data Exchange (ETDEWEB)

    Ablinger, Jakob; Schneider, Carsten [Johannes Kepler Univ., Linz (Austria). Research Inst. for Symbolic Computation (RISC); Bluemlein, Johannes; Raab, Clemens G. [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany)

    2014-07-15

    Nested sums containing binomial coefficients occur in the computation of massive operatormatrix elements. Their associated iterated integrals lead to alphabets including radicals, for which we determined a suitable basis. We discuss algorithms for converting between sum and integral representations, mainly relying on the Mellin transform. To aid the conversion we worked out dedicated rewrite rules, based on which also some general patterns emerging in the process can be obtained.

  18. Detection of Outliers in Regression Model for Medical Data

    Directory of Open Access Journals (Sweden)

    Stephen Raj S

    2017-07-01

    Full Text Available In regression analysis, an outlier is an observation for which the residual is large in magnitude compared to other observations in the data set. The detection of outliers and influential points is an important step of the regression analysis. Outlier detection methods have been used to detect and remove anomalous values from data. In this paper, we detect the presence of outliers in simple linear regression models for medical data set. Chatterjee and Hadi mentioned that the ordinary residuals are not appropriate for diagnostic purposes; a transformed version of them is preferable. First, we investigate the presence of outliers based on existing procedures of residuals and standardized residuals. Next, we have used the new approach of standardized scores for detecting outliers without the use of predicted values. The performance of the new approach was verified with the real-life data.

  19. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2015-04-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  20. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2014-06-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  1. Factors related to the use of antenatal care services in Ethiopia: Application of the zero-inflated negative binomial model.

    Science.gov (United States)

    Assefa, Enyew; Tadesse, Mekonnen

    2017-08-01

    The major causes for poor health in developing countries are inadequate access and under-use of modern health care services. The objective of this study was to identify and examine factors related to the use of antenatal care services using the 2011 Ethiopia Demographic and Health Survey data. The number of antenatal care visits during the last pregnancy by mothers aged 15 to 49 years (n = 7,737) was analyzed. More than 55% of the mothers did not use antenatal care (ANC) services, while more than 22% of the women used antenatal care services less than four times. More than half of the women (52%) who had access to health services had at least four antenatal care visits. The zero-inflated negative binomial model was found to be more appropriate for analyzing the data. Place of residence, age of mothers, woman's educational level, employment status, mass media exposure, religion, and access to health services were significantly associated with the use of antenatal care services. Accordingly, there should be progress toward a health-education program that enables more women to utilize ANC services, with the program targeting women in rural areas, uneducated women, and mothers with higher birth orders through appropriate media.

  2. On pseudo-values for regression analysis in competing risks models

    DEFF Research Database (Denmark)

    Graw, F; Gerds, Thomas Alexander; Schumacher, M

    2009-01-01

    For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding their...

  3. Modeling and prediction of Turkey's electricity consumption using Support Vector Regression

    International Nuclear Information System (INIS)

    Kavaklioglu, Kadir

    2011-01-01

    Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)

  4. Cluster regression model and level fluctuation features of Van Lake, Turkey

    Directory of Open Access Journals (Sweden)

    Z. Şen

    1999-02-01

    Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions

  5. Cluster regression model and level fluctuation features of Van Lake, Turkey

    Directory of Open Access Journals (Sweden)

    Z. Şen

    Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.

    Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions

  6. Identifiability in N-mixture models: a large-scale screening test with bird data.

    Science.gov (United States)

    Kéry, Marc

    2018-02-01

    Binomial N-mixture models have proven very useful in ecology, conservation, and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by Akaike's information criterion. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models that combine different observation models or the use of external information via informative priors or penalized likelihoods, may help. © 2017 by the Ecological Society of America.

  7. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    Science.gov (United States)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  8. Accounting for spatial effects in land use regression for urban air pollution modeling.

    Science.gov (United States)

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Combinatorial interpretations of binomial coefficient analogues related to Lucas sequences

    OpenAIRE

    Sagan, Bruce; Savage, Carla

    2009-01-01

    Let s and t be variables. Define polynomials {n} in s, t by {0}=0, {1}=1, and {n}=s{n-1}+t{n-2} for n >= 2. If s, t are integers then the corresponding sequence of integers is called a Lucas sequence. Define an analogue of the binomial coefficients by C{n,k}={n}!/({k}!{n-k}!) where {n}!={1}{2}...{n}. It is easy to see that C{n,k} is a polynomial in s and t. The purpose of this note is to give two combinatorial interpretations for this polynomial in terms of statistics on integer partitions in...

  10. “Micro” Phraseology in Action: A Look at Fixed Binomials

    Directory of Open Access Journals (Sweden)

    Dušan Gabrovšek

    2011-05-01

    Full Text Available Multiword items in English are a motley crew, as they are not only numerous but also structurally, semantically, and functionally diverse. The paper offers a fresh look at fixed binomials, an intriguing and unexpectedly heterogeneous phraseological type prototypically consisting of two lexical components with the coordinating conjunction and – less commonly but, or, (neither/ (nor – acting as the connecting element, as e.g. in body and soul, slowly but surely, sooner or later, neither fish nor fowl. In particular, their idiomaticity and lexicographical significance are highlighted, while the cross-linguistic perspective is only outlined.

  11. Bayesian inference for disease prevalence using negative binomial group testing

    Science.gov (United States)

    Pritchard, Nicholas A.; Tebbs, Joshua M.

    2011-01-01

    Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set. PMID:21259308

  12. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  13. Extending the Binomial Checkpointing Technique for Resilience

    Energy Technology Data Exchange (ETDEWEB)

    Walther, Andrea; Narayanan, Sri Hari Krishna

    2016-10-10

    In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, re- quired, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algo- rithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulations and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We de- scribe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding imple- mentation and discuss numerical results.

  14. The prediction of intelligence in preschool children using alternative models to regression.

    Science.gov (United States)

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  15. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  16. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  17. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

    Directory of Open Access Journals (Sweden)

    Soyoung Park

    2017-07-01

    Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

  18. Sample size calculation to externally validate scoring systems based on logistic regression models.

    Directory of Open Access Journals (Sweden)

    Antonio Palazón-Bru

    Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

  19. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    Science.gov (United States)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  20. Using the Logistic Regression model in supporting decisions of establishing marketing strategies

    Directory of Open Access Journals (Sweden)

    Cristinel CONSTANTIN

    2015-12-01

    Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations