WorldWideScience

Sample records for binary regression model

  1. A binary logistic regression model with complex sampling design of ...

    African Journals Online (AJOL)

    2017-09-03

    Sep 3, 2017 ... Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. .... Data was entered into STATA-12 and analyzed using. SPSS-21. .... lack of access/too far or costs too much. 35. 1.2.

  2. A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

    Science.gov (United States)

    Casero-Alonso, V; López-Fidalgo, J; Torsney, B

    2017-01-01

    Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  3. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    Science.gov (United States)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  4. Bayesian binary regression model: an application to in-hospital death after AMI prediction

    Directory of Open Access Journals (Sweden)

    Aparecida D. P. Souza

    2004-08-01

    Full Text Available A Bayesian binary regression model is developed to predict death of patients after acute myocardial infarction (AMI. Markov Chain Monte Carlo (MCMC methods are used to make inference and to evaluate Bayesian binary regression models. A model building strategy based on Bayes factor is proposed and aspects of model validation are extensively discussed in the paper, including the posterior distribution for the c-index and the analysis of residuals. Risk assessment, based on variables easily available within minutes of the patients' arrival at the hospital, is very important to decide the course of the treatment. The identified model reveals itself strongly reliable and accurate, with a rate of correct classification of 88% and a concordance index of 83%.Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.

  5. Modelling Status Food Security Households Disease Sufferers Pulmonary Tuberculosis Uses the Method Regression Logistics Binary

    Science.gov (United States)

    Wulandari, S. P.; Salamah, M.; Rositawati, A. F. D.

    2018-04-01

    Food security is the condition where the food fulfilment is managed well for the country till the individual. Indonesia is one of the country which has the commitment to create the food security becomes main priority. However, the food necessity becomes common thing means that it doesn’t care about nutrient standard and the health condition of family member, so in the fulfilment of food necessity also has to consider the disease suffered by the family member, one of them is pulmonary tuberculosa. From that reasons, this research is conducted to know the factors which influence on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya by using binary logistic regression method. The analysis result by using binary logistic regression shows that the variables wife latest education, house density and spacious house ventilation significantly affect on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya, where the wife education level is University/equivalent, the house density is eligible or 8 m2/person and spacious house ventilation 10% of the floor area has the opportunity to become food secure households amounted to 0.911089. While the chance of becoming food insecure households amounted to 0.088911. The model household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya has been conformable, and the overall percentages of those classifications are at 71.8%.

  6. Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict

    Science.gov (United States)

    Ismail, Mohd Tahir; Alias, Siti Nor Shadila

    2014-07-01

    For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..

  7. Risk of Recurrence in Operated Parasagittal Meningiomas: A Logistic Binary Regression Model.

    Science.gov (United States)

    Escribano Mesa, José Alberto; Alonso Morillejo, Enrique; Parrón Carreño, Tesifón; Huete Allut, Antonio; Narro Donate, José María; Méndez Román, Paddy; Contreras Jiménez, Ascensión; Pedrero García, Francisco; Masegosa González, José

    2018-02-01

    Parasagittal meningiomas arise from the arachnoid cells of the angle formed between the superior sagittal sinus (SSS) and the brain convexity. In this retrospective study, we focused on factors that predict early recurrence and recurrence times. We reviewed 125 patients with parasagittal meningiomas operated from 1985 to 2014. We studied the following variables: age, sex, location, laterality, histology, surgeons, invasion of the SSS, Simpson removal grade, follow-up time, angiography, embolization, radiotherapy, recurrence and recurrence time, reoperation, neurologic deficit, degree of dependency, and patient status at the end of follow-up. Patients ranged in age from 26 to 81 years (mean 57.86 years; median 60 years). There were 44 men (35.2%) and 81 women (64.8%). There were 57 patients with neurologic deficits (45.2%). The most common presenting symptom was motor deficit. World Health Organization grade I tumors were identified in 104 patients (84.6%), and the majority were the meningothelial type. Recurrence was detected in 34 cases. Time of recurrence was 9 to 336 months (mean: 84.4 months; median: 79.5 months). Male sex was identified as an independent risk for recurrence with relative risk 2.7 (95% confidence interval 1.21-6.15), P = 0.014. Kaplan-Meier curves for recurrence had statistically significant differences depending on sex, age, histologic type, and World Health Organization histologic grade. A binary logistic regression was made with the Hosmer-Lemeshow test with P > 0.05; sex, tumor size, and histologic type were used in this model. Male sex is an independent risk factor for recurrence that, associated with other factors such tumor size and histologic type, explains 74.5% of all cases in a binary regression model. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model

    Science.gov (United States)

    Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd

    2016-10-01

    Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.

  9. The likelihood of achieving quantified road safety targets: a binary logistic regression model for possible factors.

    Science.gov (United States)

    Sze, N N; Wong, S C; Lee, C Y

    2014-12-01

    In past several decades, many countries have set quantified road safety targets to motivate transport authorities to develop systematic road safety strategies and measures and facilitate the achievement of continuous road safety improvement. Studies have been conducted to evaluate the association between the setting of quantified road safety targets and road fatality reduction, in both the short and long run, by comparing road fatalities before and after the implementation of a quantified road safety target. However, not much work has been done to evaluate whether the quantified road safety targets are actually achieved. In this study, we used a binary logistic regression model to examine the factors - including vehicle ownership, fatality rate, and national income, in addition to level of ambition and duration of target - that contribute to a target's success. We analyzed 55 quantified road safety targets set by 29 countries from 1981 to 2009, and the results indicate that targets that are in progress and with lower level of ambitions had a higher likelihood of eventually being achieved. Moreover, possible interaction effects on the association between level of ambition and the likelihood of success are also revealed. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. RBSURFpred: Modeling protein accessible surface area in real and binary space using regularized and optimized regression.

    Science.gov (United States)

    Tarafder, Sumit; Toukir Ahmed, Md; Iqbal, Sumaiya; Tamjidul Hoque, Md; Sohel Rahman, M

    2018-03-14

    Accessible surface area (ASA) of a protein residue is an effective feature for protein structure prediction, binding region identification, fold recognition problems etc. Improving the prediction of ASA by the application of effective feature variables is a challenging but explorable task to consider, specially in the field of machine learning. Among the existing predictors of ASA, REGAd 3 p is a highly accurate ASA predictor which is based on regularized exact regression with polynomial kernel of degree 3. In this work, we present a new predictor RBSURFpred, which extends REGAd 3 p on several dimensions by incorporating 58 physicochemical, evolutionary and structural properties into 9-tuple peptides via Chou's general PseAAC, which allowed us to obtain higher accuracies in predicting both real-valued and binary ASA. We have compared RBSURFpred for both real and binary space predictions with state-of-the-art predictors, such as REGAd 3 p and SPIDER2. We also have carried out a rigorous analysis of the performance of RBSURFpred in terms of different amino acids and their properties, and also with biologically relevant case-studies. The performance of RBSURFpred establishes itself as a useful tool for the community. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Predicting Social Trust with Binary Logistic Regression

    Science.gov (United States)

    Adwere-Boamah, Joseph; Hufstedler, Shirley

    2015-01-01

    This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…

  12. Face Alignment via Regressing Local Binary Features.

    Science.gov (United States)

    Ren, Shaoqing; Cao, Xudong; Wei, Yichen; Sun, Jian

    2016-03-01

    This paper presents a highly efficient and accurate regression approach for face alignment. Our approach has two novel components: 1) a set of local binary features and 2) a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. This approach achieves the state-of-the-art results when tested on the most challenging benchmarks to date. Furthermore, because extracting and regressing local binary features are computationally very cheap, our system is much faster than previous methods. It achieves over 3000 frames per second (FPS) on a desktop or 300 FPS on a mobile phone for locating a few dozens of landmarks. We also study a key issue that is important but has received little attention in the previous research, which is the face detector used to initialize alignment. We investigate several face detectors and perform quantitative evaluation on how they affect alignment accuracy. We find that an alignment friendly detector can further greatly boost the accuracy of our alignment method, reducing the error up to 16% relatively. To facilitate practical usage of face detection/alignment methods, we also propose a convenient metric to measure how good a detector is for alignment initialization.

  13. Modelling binary data

    CERN Document Server

    Collett, David

    2002-01-01

    INTRODUCTION Some Examples The Scope of this Book Use of Statistical Software STATISTICAL INFERENCE FOR BINARY DATA The Binomial Distribution Inference about the Success Probability Comparison of Two Proportions Comparison of Two or More Proportions MODELS FOR BINARY AND BINOMIAL DATA Statistical Modelling Linear Models Methods of Estimation Fitting Linear Models to Binomial Data Models for Binomial Response Data The Linear Logistic Model Fitting the Linear Logistic Model to Binomial Data Goodness of Fit of a Linear Logistic Model Comparing Linear Logistic Models Linear Trend in Proportions Comparing Stimulus-Response Relationships Non-Convergence and Overfitting Some other Goodness of Fit Statistics Strategy for Model Selection Predicting a Binary Response Probability BIOASSAY AND SOME OTHER APPLICATIONS The Tolerance Distribution Estimating an Effective Dose Relative Potency Natural Response Non-Linear Logistic Regression Models Applications of the Complementary Log-Log Model MODEL CHECKING Definition of Re...

  14. A binary logistic regression model with complex sampling design of unmet need for family planning among all women aged (15-49) in Ethiopia.

    Science.gov (United States)

    Workie, Demeke Lakew; Zike, Dereje Tesfaye; Fenta, Haile Mekonnen; Mekonnen, Mulusew Admasu

    2017-09-01

    Unintended pregnancy related to unmet need is a worldwide problem that affects societies. The main objective of this study was to identify the prevalence and determinants of unmet need for family planning among women aged (15-49) in Ethiopia. The Performance Monitoring and Accountability2020/Ethiopia was conducted in April 2016 at round-4 from 7494 women with two-stage-stratified sampling. Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. The prevalence of unmet-need for family planning was 16.2% in Ethiopia. Women between the age range of 15-24 years were 2.266 times more likely to have unmet need family planning compared to above 35 years. Women who were currently married were about 8 times more likely to have unmet need family planning compared to never married women. Women who had no under-five child were 0.125 times less likely to have unmet need family planning compared to those who had more than two-under-5. The key determinants of unmet need family planning in Ethiopia were residence, age, marital-status, education, household members, birth-events and number of under-5 children. Thus the Government of Ethiopia would take immediate steps to address the causes of high unmet need for family planning among women.

  15. Flexible link functions in nonparametric binary regression with Gaussian process priors.

    Science.gov (United States)

    Li, Dan; Wang, Xia; Lin, Lizhen; Dey, Dipak K

    2016-09-01

    In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. © 2015, The International Biometric Society.

  16. Parameter Estimation for Improving Association Indicators in Binary Logistic Regression

    Directory of Open Access Journals (Sweden)

    Mahdi Bashiri

    2012-02-01

    Full Text Available The aim of this paper is estimation of Binary logistic regression parameters for maximizing the log-likelihood function with improved association indicators. In this paper the parameter estimation steps have been explained and then measures of association have been introduced and their calculations have been analyzed. Moreover a new related indicators based on membership degree level have been expressed. Indeed association measures demonstrate the number of success responses occurred in front of failure in certain number of Bernoulli independent experiments. In parameter estimation, existing indicators values is not sensitive to the parameter values, whereas the proposed indicators are sensitive to the estimated parameters during the iterative procedure. Therefore, proposing a new association indicator of binary logistic regression with more sensitivity to the estimated parameters in maximizing the log- likelihood in iterative procedure is innovation of this study.

  17. Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.

    Science.gov (United States)

    Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai

    2017-04-01

    This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive

  18. Completing the Remedial Sequence and College-Level Credit-Bearing Math: Comparing Binary, Cumulative, and Continuation Ratio Logistic Regression Models

    Science.gov (United States)

    Davidson, J. Cody

    2016-01-01

    Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…

  19. Sunspot Cycle Prediction Using Multivariate Regression and Binary ...

    Indian Academy of Sciences (India)

    49

    Multivariate regression model has been derived based on the available cycles 1 .... The flare index correlates well with various parameters of the solar activity. ...... 32) Sabarinath A and Anilkumar A K 2011 A stochastic prediction model for the.

  20. Mesoscopic model for binary fluids

    Science.gov (United States)

    Echeverria, C.; Tucci, K.; Alvarez-Llamoza, O.; Orozco-Guillén, E. E.; Morales, M.; Cosenza, M. G.

    2017-10-01

    We propose a model for studying binary fluids based on the mesoscopic molecular simulation technique known as multiparticle collision, where the space and state variables are continuous, and time is discrete. We include a repulsion rule to simulate segregation processes that does not require calculation of the interaction forces between particles, so binary fluids can be described on a mesoscopic scale. The model is conceptually simple and computationally efficient; it maintains Galilean invariance and conserves the mass and energy in the system at the micro- and macro-scale, whereas momentum is conserved globally. For a wide range of temperatures and densities, the model yields results in good agreement with the known properties of binary fluids, such as the density profile, interface width, phase separation, and phase growth. We also apply the model to the study of binary fluids in crowded environments with consistent results.

  1. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  2. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

    Directory of Open Access Journals (Sweden)

    Maarten van Smeden

    2016-11-01

    Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

  3. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

    Science.gov (United States)

    van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B

    2016-11-24

    Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

  4. (Non) linear regression modelling

    NARCIS (Netherlands)

    Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.

    2012-01-01

    We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or

  5. Panel Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    González, Andrés; Terasvirta, Timo; Dijk, Dick van

    We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...

  6. Misclassification in binary choice models

    Czech Academy of Sciences Publication Activity Database

    Meyer, B. D.; Mittag, Nikolas

    2017-01-01

    Roč. 200, č. 2 (2017), s. 295-311 ISSN 0304-4076 R&D Projects: GA ČR(CZ) GJ16-07603Y Institutional support: Progres-Q24 Keywords : measurement error * binary choice models * program take-up Subject RIV: AH - Economics OBOR OECD: Economic Theory Impact factor: 1.633, year: 2016

  7. Misclassification in binary choice models

    Czech Academy of Sciences Publication Activity Database

    Meyer, B. D.; Mittag, Nikolas

    2017-01-01

    Roč. 200, č. 2 (2017), s. 295-311 ISSN 0304-4076 Institutional support: RVO:67985998 Keywords : measurement error * binary choice models * program take-up Subject RIV: AH - Economics OBOR OECD: Economic Theory Impact factor: 1.633, year: 2016

  8. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    Science.gov (United States)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  9. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  10. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  11. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  12. Classifying machinery condition using oil samples and binary logistic regression

    Science.gov (United States)

    Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.

    2015-08-01

    The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.

  13. A novel hybrid method of beta-turn identification in protein using binary logistic regression and neural network.

    Science.gov (United States)

    Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz

    2012-01-01

    From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.

  14. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  15. MEMPREDIKSI FINANCIAL DISTRESS DENGAN BINARY LOGIT REGRESSION PERUSAHAAN TELEKOMUNIKASI

    Directory of Open Access Journals (Sweden)

    Tiara Widya Antikasari

    2017-04-01

    Full Text Available In this globalization era, sub–sector telecommunication industry has rapid development as time goes by with the number of customers’ growth. However, its growth is not balanced with operational revenue development. Therefore, it is important to analyze the financial distress in telecommunication companies in order to avoid bankruptcy. This research aimed to investigate the effect of financial ratios to predict probability of financial distress. Financial ratios indicator used profitability ratio, liquidity ratio, activity ratio, and leverage ratio. The population in this research was telecommunication companies listed in the Indonesia Stock Exchange periods 2009-2016. Based on purposive sampling method, the criteria of financial distress in this study was measured by using net operation negative two years, while statistic analysis used was logistic regression with a significance level of 10%. The result was that liquidity ratio (current ratio and activity ratio (total asset turnover ratio had a negative significant value, and profitability ratio(return on asset and leverage ratio (debt to total asset had positive significant value to predict financial distress.

  16. Regression Models for Repairable Systems

    Czech Academy of Sciences Publication Activity Database

    Novák, Petr

    2015-01-01

    Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf

  17. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  18. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  19. Gaussian Process Regression Model in Spatial Logistic Regression

    Science.gov (United States)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  20. Unobserved Heterogeneity in the Binary Logit Model with Cross-Sectional Data and Short Panels

    DEFF Research Database (Denmark)

    Holm, Anders; Jæger, Mads Meier; Pedersen, Morten

    This paper proposes a new approach to dealing with unobserved heterogeneity in applied research using the binary logit model with cross-sectional data and short panels. Unobserved heterogeneity is particularly important in non-linear regression models such as the binary logit model because, unlike...... in linear regression models, estimates of the effects of observed independent variables are biased even when omitted independent variables are uncorrelated with the observed independent variables. We propose an extension of the binary logit model based on a finite mixture approach in which we conceptualize...

  1. A simple model for binary star evolution

    International Nuclear Information System (INIS)

    Whyte, C.A.; Eggleton, P.P.

    1985-01-01

    A simple model for calculating the evolution of binary stars is presented. Detailed stellar evolution calculations of stars undergoing mass and energy transfer at various rates are reported and used to identify the dominant physical processes which determine the type of evolution. These detailed calculations are used to calibrate the simple model and a comparison of calculations using the detailed stellar evolution equations and the simple model is made. Results of the evolution of a few binary systems are reported and compared with previously published calculations using normal stellar evolution programs. (author)

  2. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  3. Variable importance in latent variable regression models

    NARCIS (Netherlands)

    Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.

    2014-01-01

    The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable

  4. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  5. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  6. Regression Models for Market-Shares

    DEFF Research Database (Denmark)

    Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue

    2005-01-01

    On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....

  7. Categorical regression dose-response modeling

    Science.gov (United States)

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  8. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  9. A Bayesian model for binary Markov chains

    Directory of Open Access Journals (Sweden)

    Belkheir Essebbar

    2004-02-01

    Full Text Available This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on the Jeffreys' prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.

  10. Regression Models For Multivariate Count Data.

    Science.gov (United States)

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  11. Testing homogeneity in Weibull-regression models.

    Science.gov (United States)

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  12. Mixed-effects regression models in linguistics

    CERN Document Server

    Heylen, Kris; Geeraerts, Dirk

    2018-01-01

    When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed.  In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...

  13. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  14. Influence diagnostics in meta-regression model.

    Science.gov (United States)

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  15. AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Н. Білак

    2012-04-01

    Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.

  16. ACOUSTIC EFFECTS ON BINARY AEROELASTICITY MODEL

    Directory of Open Access Journals (Sweden)

    Kok Hwa Yu

    2011-10-01

    Full Text Available Acoustics is the science concerned with the study of sound. The effects of sound on structures attract overwhelm interests and numerous studies were carried out in this particular area. Many of the preliminary investigations show that acoustic pressure produces significant influences on structures such as thin plate, membrane and also high-impedance medium like water (and other similar fluids. Thus, it is useful to investigate the structure response with the presence of acoustics on aircraft, especially on aircraft wings, tails and control surfaces which are vulnerable to flutter phenomena. The present paper describes the modeling of structural-acoustic interactions to simulate the external acoustic effect on binary flutter model. Here, the binary flutter model which illustrated as a rectangular wing is constructed using strip theory with simplified unsteady aerodynamics involving flap and pitch degree of freedom terms. The external acoustic excitation, on the other hand, is modeled using four-node quadrilateral isoparametric element via finite element approach. Both equations then carefully coupled and solved using eigenvalue solution. The mentioned approach is implemented in MATLAB and the outcome of the simulated result are later described, analyzed and illustrated in this paper.

  17. Modeling oil production based on symbolic regression

    International Nuclear Information System (INIS)

    Yang, Guangfei; Li, Xianneng; Wang, Jianliang; Lian, Lian; Ma, Tieju

    2015-01-01

    Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak

  18. Sample size calculation to externally validate scoring systems based on logistic regression models.

    Directory of Open Access Journals (Sweden)

    Antonio Palazón-Bru

    Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

  19. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  20. A binary neutron star GRB model

    International Nuclear Information System (INIS)

    Wilson, J.R.; Salmonson, J.D.; Wilson, J.R.; Mathews, G.J.

    1998-01-01

    In this paper we present the preliminary results of a model for the production of gamma-ray bursts (GRBs) through the compressional heating of binary neutron stars near their last stable orbit prior to merger. Recent numerical studies of the general relativistic (GR) hydrodynamics in three spatial dimensions of close neutron star binaries (NSBs) have uncovered evidence for the compression and heating of the individual neutron stars (NSs) prior to merger 12. This effect will have significant effect on the production of gravitational waves, neutrinos and, ultimately, energetic photons. The study of the production of these photons in close NSBs and, in particular, its correspondence to observed GRBs is the subject of this paper. The gamma-rays arise as follows. Compressional heating causes the neutron stars to emit neutrino pairs which, in turn, annihilate to produce a hot electron-positron pair plasma. This pair-photon plasma expands rapidly until it becomes optically thin, at which point the photons are released. We show that this process can indeed satisfy three basic requirements of a model for cosmological gamma-ray bursts: (1) sufficient gamma-ray energy release (>10 51 ergs) to produce observed fluxes, (2) a time-scale of the primary burst duration consistent with that of a 'classical' GRB (∼10 seconds), and (3) the peak of the photon number spectrum matches that of 'classical' GRB (∼300 keV). copyright 1998 American Institute of Physics

  1. Modeling and analysis of advanced binary cycles

    Energy Technology Data Exchange (ETDEWEB)

    Gawlik, K.

    1997-12-31

    A computer model (Cycle Analysis Simulation Tool, CAST) and a methodology have been developed to perform value analysis for small, low- to moderate-temperature binary geothermal power plants. The value analysis method allows for incremental changes in the levelized electricity cost (LEC) to be determined between a baseline plant and a modified plant. Thermodynamic cycle analyses and component sizing are carried out in the model followed by economic analysis which provides LEC results. The emphasis of the present work is on evaluating the effect of mixed working fluids instead of pure fluids on the LEC of a geothermal binary plant that uses a simple Organic Rankine Cycle. Four resources were studied spanning the range of 265{degrees}F to 375{degrees}F. A variety of isobutane and propane based mixtures, in addition to pure fluids, were used as working fluids. This study shows that the use of propane mixtures at a 265{degrees}F resource can reduce the LEC by 24% when compared to a base case value that utilizes commercial isobutane as its working fluid. The cost savings drop to 6% for a 375{degrees}F resource, where an isobutane mixture is favored. Supercritical cycles were found to have the lowest cost at all resources.

  2. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  3. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  4. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  5. Confidence bands for inverse regression models

    International Nuclear Information System (INIS)

    Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo

    2010-01-01

    We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract

  6. Multitask Quantile Regression under the Transnormal Model.

    Science.gov (United States)

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2016-01-01

    We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.

  7. Crime Modeling using Spatial Regression Approach

    Science.gov (United States)

    Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.

    2018-01-01

    Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.

  8. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  9. Entrepreneurial intention modeling using hierarchical multiple regression

    Directory of Open Access Journals (Sweden)

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  10. An Additive-Multiplicative Cox-Aalen Regression Model

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  11. Accuracy of binary black hole waveform models for aligned-spin binaries

    Science.gov (United States)

    Kumar, Prayush; Chu, Tony; Fong, Heather; Pfeiffer, Harald P.; Boyle, Michael; Hemberger, Daniel A.; Kidder, Lawrence E.; Scheel, Mark A.; Szilagyi, Bela

    2016-05-01

    Coalescing binary black holes are among the primary science targets for second generation ground-based gravitational wave detectors. Reliable gravitational waveform models are central to detection of such systems and subsequent parameter estimation. This paper performs a comprehensive analysis of the accuracy of recent waveform models for binary black holes with aligned spins, utilizing a new set of 84 high-accuracy numerical relativity simulations. Our analysis covers comparable mass binaries (mass-ratio 1 ≤q ≤3 ), and samples independently both black hole spins up to a dimensionless spin magnitude of 0.9 for equal-mass binaries and 0.85 for unequal mass binaries. Furthermore, we focus on the high-mass regime (total mass ≳50 M⊙ ). The two most recent waveform models considered (PhenomD and SEOBNRv2) both perform very well for signal detection, losing less than 0.5% of the recoverable signal-to-noise ratio ρ , except that SEOBNRv2's efficiency drops slightly for both black hole spins aligned at large magnitude. For parameter estimation, modeling inaccuracies of the SEOBNRv2 model are found to be smaller than systematic uncertainties for moderately strong GW events up to roughly ρ ≲15 . PhenomD's modeling errors are found to be smaller than SEOBNRv2's, and are generally irrelevant for ρ ≲20 . Both models' accuracy deteriorates with increased mass ratio, and when at least one black hole spin is large and aligned. The SEOBNRv2 model shows a pronounced disagreement with the numerical relativity simulation in the merger phase, for unequal masses and simultaneously both black hole spins very large and aligned. Two older waveform models (PhenomC and SEOBNRv1) are found to be distinctly less accurate than the more recent PhenomD and SEOBNRv2 models. Finally, we quantify the bias expected from all four waveform models during parameter estimation for several recovered binary parameters: chirp mass, mass ratio, and effective spin.

  12. Variable selection and model choice in geoadditive regression models.

    Science.gov (United States)

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  13. Efficient and robust estimation for longitudinal mixed models for binary data

    DEFF Research Database (Denmark)

    Holst, René

    2009-01-01

    This paper proposes a longitudinal mixed model for binary data. The model extends the classical Poisson trick, in which a binomial regression is fitted by switching to a Poisson framework. A recent estimating equations method for generalized linear longitudinal mixed models, called GEEP, is used...... as a vehicle for fitting the conditional Poisson regressions, given a latent process of serial correlated Tweedie variables. The regression parameters are estimated using a quasi-score method, whereas the dispersion and correlation parameters are estimated by use of bias-corrected Pearson-type estimating...... equations, using second moments only. Random effects are predicted by BLUPs. The method provides a computationally efficient and robust approach to the estimation of longitudinal clustered binary data and accommodates linear and non-linear models. A simulation study is used for validation and finally...

  14. Hierarchical regression analysis in structural Equation Modeling

    NARCIS (Netherlands)

    de Jong, P.F.

    1999-01-01

    In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main

  15. Poisson Mixture Regression Models for Heart Disease Prediction.

    Science.gov (United States)

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  16. Star formation history: Modeling of visual binaries

    Science.gov (United States)

    Gebrehiwot, Y. M.; Tessema, S. B.; Malkov, O. Yu.; Kovaleva, D. A.; Sytov, A. Yu.; Tutukov, A. V.

    2018-05-01

    Most stars form in binary or multiple systems. Their evolution is defined by masses of components, orbital separation and eccentricity. In order to understand star formation and evolutionary processes, it is vital to find distributions of physical parameters of binaries. We have carried out Monte Carlo simulations in which we simulate different pairing scenarios: random pairing, primary-constrained pairing, split-core pairing, and total and primary pairing in order to get distributions of binaries over physical parameters at birth. Next, for comparison with observations, we account for stellar evolution and selection effects. Brightness, radius, temperature, and other parameters of components are assigned or calculated according to approximate relations for stars in different evolutionary stages (main-sequence stars, red giants, white dwarfs, relativistic objects). Evolutionary stage is defined as a function of system age and component masses. We compare our results with the observed IMF, binarity rate, and binary mass-ratio distributions for field visual binaries to find initial distributions and pairing scenarios that produce observed distributions.

  17. Modeling maximum daily temperature using a varying coefficient regression model

    Science.gov (United States)

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  18. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  19. Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits

    National Research Council Canada - National Science Library

    Gravier, Michael

    1999-01-01

    .... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...

  20. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  1. The MIDAS Touch: Mixed Data Sampling Regression Models

    OpenAIRE

    Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen

    2004-01-01

    We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies. Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies. We examine the asymptotic properties of MIDAS regression estimation and compare it with traditional distributed lag models. MIDAS regressions have wide applicability in macroeconomics and �nance.

  2. Model selection in kernel ridge regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    2013-01-01

    Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

  3. Physics Of Eclipsing Binaries. II. Towards the Increased Model Fidelity

    OpenAIRE

    Prša, Andrej; Conroy, Kyle E.; Horvat, Martin; Pablo, Herbert; Kochoska, Angela; Bloemen, Steven; Giammarco, Joseph; Hambleton, Kelly M.; Degroote, Pieter

    2016-01-01

    The precision of photometric and spectroscopic observations has been systematically improved in the last decade, mostly thanks to space-borne photometric missions and ground-based spectrographs dedicated to finding exoplanets. The field of eclipsing binary stars strongly benefited from this development. Eclipsing binaries serve as critical tools for determining fundamental stellar properties (masses, radii, temperatures and luminosities), yet the models are not capable of reproducing observed...

  4. The Effect of Latent Binary Variables on the Uncertainty of the Prediction of a Dichotomous Outcome Using Logistic Regression Based Propensity Score Matching.

    Science.gov (United States)

    Szekér, Szabolcs; Vathy-Fogarassy, Ágnes

    2018-01-01

    Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.

  5. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  6. Linear regression crash prediction models : issues and proposed solutions.

    Science.gov (United States)

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  7. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia; Rue, Haavard

    2018-01-01

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite

  8. Forecasting Ebola with a regression transmission model

    Directory of Open Access Journals (Sweden)

    Jason Asher

    2018-03-01

    Full Text Available We describe a relatively simple stochastic model of Ebola transmission that was used to produce forecasts with the lowest mean absolute error among Ebola Forecasting Challenge participants. The model enabled prediction of peak incidence, the timing of this peak, and final size of the outbreak. The underlying discrete-time compartmental model used a time-varying reproductive rate modeled as a multiplicative random walk driven by the number of infectious individuals. This structure generalizes traditional Susceptible-Infected-Recovered (SIR disease modeling approaches and allows for the flexible consideration of outbreaks with complex trajectories of disease dynamics. Keywords: Ebola, Forecasting, Mathematical modeling, Bayesian inference

  9. Forecasting Ebola with a regression transmission model

    OpenAIRE

    Asher, Jason

    2017-01-01

    We describe a relatively simple stochastic model of Ebola transmission that was used to produce forecasts with the lowest mean absolute error among Ebola Forecasting Challenge participants. The model enabled prediction of peak incidence, the timing of this peak, and final size of the outbreak. The underlying discrete-time compartmental model used a time-varying reproductive rate modeled as a multiplicative random walk driven by the number of infectious individuals. This structure generalizes ...

  10. Model Selection in Kernel Ridge Regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...

  11. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  12. Corporate prediction models, ratios or regression analysis?

    NARCIS (Netherlands)

    Bijnen, E.J.; Wijn, M.F.C.M.

    1994-01-01

    The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in

  13. STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...

    African Journals Online (AJOL)

    ... downstream Obigbo station show: consistent time-trends in degree of contamination; linear and non-linear relationships for water quality models against total dissolved solids (TDS), total suspended sediment (TSS), chloride, pH and sulphate; and non-linear relationship for streamflow and water quality transport models.

  14. Radio emission from symbiotic stars: a binary model

    International Nuclear Information System (INIS)

    Taylor, A.R.; Seaquist, E.R.

    1985-01-01

    The authors examine a binary model for symbiotic stars to account for their radio properties. The system is comprised of a cool, mass-losing star and a hot companion. Radio emission arises in the portion of the stellar wind photo-ionized by the hot star. Computer simulations for the case of uniform mass loss at constant velocity show that when less than half the wind is ionized, optically thick spectral indices greater than +0.6 are produced. Model fits to radio spectra allow the binary separation, wind density and ionizing photon luminosity to be calculated. They apply the model to the symbiotic star H1-36. (orig.)

  15. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  16. Multiattribute shopping models and ridge regression analysis

    NARCIS (Netherlands)

    Timmermans, H.J.P.

    1981-01-01

    Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing

  17. Linear Regression Models for Estimating True Subsurface ...

    Indian Academy of Sciences (India)

    47

    The objective is to minimize the processing time and computer memory required. 10 to carry out inversion .... to the mainland by two long bridges. .... term. In this approach, the model converges when the squared sum of the differences. 143.

  18. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  19. Alternative regression models to assess increase in childhood BMI

    OpenAIRE

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-01-01

    Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...

  20. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Hong-Juan Li

    2013-04-01

    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  1. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model.

    Science.gov (United States)

    Austin, Peter C

    2018-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.

  2. Modeling the effects of binary mixtures on survival in time.

    NARCIS (Netherlands)

    Baas, J.; van Houte, B.P.P.; van Gestel, C.A.M.; Kooijman, S.A.L.M.

    2007-01-01

    In general, effects of mixtures are difficult to describe, and most of the models in use are descriptive in nature and lack a strong mechanistic basis. The aim of this experiment was to develop a process-based model for the interpretation of mixture toxicity measurements, with effects of binary

  3. An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

    Science.gov (United States)

    Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

    2014-09-01

    Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.

  4. Poisson Mixture Regression Models for Heart Disease Prediction

    Science.gov (United States)

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  5. Factors associated with trait anger level of juvenile offenders in Hubei province: A binary logistic regression analysis.

    Science.gov (United States)

    Tang, Li-Na; Ye, Xiao-Zhou; Yan, Qiu-Ge; Chang, Hong-Juan; Ma, Yu-Qiao; Liu, De-Bin; Li, Zhi-Gen; Yu, Yi-Zhen

    2017-02-01

    The risk factors of high trait anger of juvenile offenders were explored through questionnaire study in a youth correctional facility of Hubei province, China. A total of 1090 juvenile offenders in Hubei province were investigated by self-compiled social-demographic questionnaire, Childhood Trauma Questionnaire (CTQ), and State-Trait Anger Expression Inventory-II (STAXI-II). The risk factors were analyzed by chi-square tests, correlation analysis, and binary logistic regression analysis with SPSS 19.0. A total of 1082 copies of valid questionnaires were collected. High trait anger group (n=316) was defined as those who scored in the upper 27th percentile of STAXI-II trait anger scale (TAS), and the rest were defined as low trait anger group (n=766). The risk factors associated with high level of trait anger included: childhood emotional abuse, childhood sexual abuse, step family, frequent drug abuse, and frequent internet using (P0.05). It was suggested that traumatic experience in childhood and unhealthy life style may significantly increase the level of trait anger in adulthood. The risk factors of high trait anger and their effects should be taken into consideration seriously.

  6. Evolutionary models of early-type contact binary SV Centauri

    Energy Technology Data Exchange (ETDEWEB)

    Nakamura, Y; Saio, H [Tohoku Univ., Sendai (Japan). Faculty of Science; Sugimoto, Daiichiro

    1978-12-01

    Models of the early-type contact binary system SV Centauri are computed with a binary-star evolution program. The effects of mass exchange, i.e., the effects of mass acceptance as well as mass loss, are properly included. With the initial masses of the component stars as 12.4 and 8.0 M sub(solar mass), the following observed configurations are well reproduced; the component stars are definitely in contact and the rate of mass exchange is 4 x 10/sup -4/ M sub(solar mass)yr/sup -1/. The more massive component is less luminous and has a lower effective temperature. Such features are also reproduced quantitatively. Agreement of the computed models with observation indicates that the binary system SV Cen is actually in the phase of rapid mass exchange preceding the mass-ratio reversal.

  7. Accuracy of Binary Black Hole Waveform Models for Advanced LIGO

    Science.gov (United States)

    Kumar, Prayush; Fong, Heather; Barkett, Kevin; Bhagwat, Swetha; Afshari, Nousha; Chu, Tony; Brown, Duncan; Lovelace, Geoffrey; Pfeiffer, Harald; Scheel, Mark; Szilagyi, Bela; Simulating Extreme Spacetimes (SXS) Team

    2016-03-01

    Coalescing binaries of compact objects, such as black holes and neutron stars, are the primary targets for gravitational-wave (GW) detection with Advanced LIGO. Accurate modeling of the emitted GWs is required to extract information about the binary source. The most accurate solution to the general relativistic two-body problem is available in numerical relativity (NR), which is however limited in application due to computational cost. Current searches use semi-analytic models that are based in post-Newtonian (PN) theory and calibrated to NR. In this talk, I will present comparisons between contemporary models and high-accuracy numerical simulations performed using the Spectral Einstein Code (SpEC), focusing at the questions: (i) How well do models capture binary's late-inspiral where they lack a-priori accurate information from PN or NR, and (ii) How accurately do they model binaries with parameters outside their range of calibration. These results guide the choice of templates for future GW searches, and motivate future modeling efforts.

  8. A test for the parameters of multiple linear regression models ...

    African Journals Online (AJOL)

    A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...

  9. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Directory of Open Access Journals (Sweden)

    Eric Ghysels

    2016-08-01

    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  10. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  11. Evolutionary model of the subdwarf binary system LB3459

    International Nuclear Information System (INIS)

    Paczynski, B.; Dearborn, D.S.

    1980-01-01

    An evolutionary model is proposed for the eclipsing binary system LB 3459 (=CPD-60 0 389 = HDE 269696). The two stars are hot subdwarfs with degenerate helium cores, hydrogen burning shell sources and low mass hydrogen rich envelopes. The system probably evolved through two common envelope phases. After the first such phase it might look like the semi-detached binary AS Eri. Soon after the second common envelope phase the system might look like UU Sge, an eclipsing binary nucleus of a planetary nebula. The present mass of the optical (spectroscopic) primary is probably close to 0.24 solar mass, and the predicted radial velocity amplitude of the primary is about 150 km/s. The optical secondary should be hotter and bolometrically brighter, with a mass of 0.32 solar mass. The primary eclipse is an occultation. (author)

  12. PHYSICS OF ECLIPSING BINARIES. II. TOWARD THE INCREASED MODEL FIDELITY

    Energy Technology Data Exchange (ETDEWEB)

    Prša, A.; Conroy, K. E.; Horvat, M.; Kochoska, A.; Hambleton, K. M. [Villanova University, Dept. of Astrophysics and Planetary Sciences, 800 E Lancaster Avenue, Villanova PA 19085 (United States); Pablo, H. [Université de Montréal, Pavillon Roger-Gaudry, 2900, boul. Édouard-Montpetit Montréal QC H3T 1J4 (Canada); Bloemen, S. [Radboud University Nijmegen, Department of Astrophysics, IMAPP, P.O. Box 9010, 6500 GL, Nijmegen (Netherlands); Giammarco, J. [Eastern University, Dept. of Astronomy and Physics, 1300 Eagle Road, St. Davids, PA 19087 (United States); Degroote, P. [KU Leuven, Instituut voor Sterrenkunde, Celestijnenlaan 200D, B-3001 Heverlee (Belgium)

    2016-12-01

    The precision of photometric and spectroscopic observations has been systematically improved in the last decade, mostly thanks to space-borne photometric missions and ground-based spectrographs dedicated to finding exoplanets. The field of eclipsing binary stars strongly benefited from this development. Eclipsing binaries serve as critical tools for determining fundamental stellar properties (masses, radii, temperatures, and luminosities), yet the models are not capable of reproducing observed data well, either because of the missing physics or because of insufficient precision. This led to a predicament where radiative and dynamical effects, insofar buried in noise, started showing up routinely in the data, but were not accounted for in the models. PHOEBE (PHysics Of Eclipsing BinariEs; http://phoebe-project.org) is an open source modeling code for computing theoretical light and radial velocity curves that addresses both problems by incorporating missing physics and by increasing the computational fidelity. In particular, we discuss triangulation as a superior surface discretization algorithm, meshing of rotating single stars, light travel time effects, advanced phase computation, volume conservation in eccentric orbits, and improved computation of local intensity across the stellar surfaces that includes the photon-weighted mode, the enhanced limb darkening treatment, the better reflection treatment, and Doppler boosting. Here we present the concepts on which PHOEBE is built and proofs of concept that demonstrate the increased model fidelity.

  13. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    Science.gov (United States)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust

  14. A generalized multivariate regression model for modelling ocean wave heights

    Science.gov (United States)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  15. Latent Classification Models for Binary Data

    DEFF Research Database (Denmark)

    Langseth, Helge; Nielsen, Thomas Dyhre

    2009-01-01

    One of the simplest, and yet most consistently well-performing set of classifiers is the naive Bayes models (a special class of Bayesian network models). However, these models rely on the (naive) assumption that all the attributes used to describe an instance are conditionally independent given t...

  16. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  17. Detection of epistatic effects with logic regression and a classical linear regression model.

    Science.gov (United States)

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  18. A complete waveform model for compact binaries on eccentric orbits

    Science.gov (United States)

    George, Daniel; Huerta, Eliu; Kumar, Prayush; Agarwal, Bhanu; Schive, Hsi-Yu; Pfeiffer, Harald; Chu, Tony; Boyle, Michael; Hemberger, Daniel; Kidder, Lawrence; Scheel, Mark; Szilagyi, Bela

    2017-01-01

    We present a time domain waveform model that describes the inspiral, merger and ringdown of compact binary systems whose components are non-spinning, and which evolve on orbits with low to moderate eccentricity. We show that this inspiral-merger-ringdown waveform model reproduces the effective-one-body model for black hole binaries with mass-ratios between 1 to 15 in the zero eccentricity limit over a wide range of the parameter space under consideration. We use this model to show that the gravitational wave transients GW150914 and GW151226 can be effectively recovered with template banks of quasicircular, spin-aligned waveforms if the eccentricity e0 of these systems when they enter the aLIGO band at a gravitational wave frequency of 14 Hz satisfies e0GW 150914 <= 0 . 15 and e0GW 151226 <= 0 . 1 .

  19. Modelling binary black-hole coalescence

    International Nuclear Information System (INIS)

    Baker, John

    2003-01-01

    The final burst of radiation from the coalescence of two supermassive black holes produces extraordinary gravitational wave luminosity making these events visible to LISA even out to large redshift. Interpreting such observations will require detailed theoretical models, based on general relativity. The effort to construct these models is just beginning to produce results. I describe the Lazarus approach to modelling these radiation bursts, and present recent results which indicate that the system loses, in the last few wave cycles, about 3% of its mass-energy as strongly polarized gravitational radiation

  20. Structured Additive Regression Models: An R Interface to BayesX

    Directory of Open Access Journals (Sweden)

    Nikolaus Umlauf

    2015-02-01

    Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

  1. Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey)

    Science.gov (United States)

    Ozdemir, Adnan

    2011-07-01

    SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model

  2. Random regression models for detection of gene by environment interaction

    Directory of Open Access Journals (Sweden)

    Meuwissen Theo HE

    2007-02-01

    Full Text Available Abstract Two random regression models, where the effect of a putative QTL was regressed on an environmental gradient, are described. The first model estimates the correlation between intercept and slope of the random regression, while the other model restricts this correlation to 1 or -1, which is expected under a bi-allelic QTL model. The random regression models were compared to a model assuming no gene by environment interactions. The comparison was done with regards to the models ability to detect QTL, to position them accurately and to detect possible QTL by environment interactions. A simulation study based on a granddaughter design was conducted, and QTL were assumed, either by assigning an effect independent of the environment or as a linear function of a simulated environmental gradient. It was concluded that the random regression models were suitable for detection of QTL effects, in the presence and absence of interactions with environmental gradients. Fixing the correlation between intercept and slope of the random regression had a positive effect on power when the QTL effects re-ranked between environments.

  3. Confounding of three binary-variables counterfactual model

    OpenAIRE

    Liu, Jingwei; Hu, Shuang

    2011-01-01

    Confounding of three binary-variables counterfactual model is discussed in this paper. According to the effect between the control variable and the covariate variable, we investigate three counterfactual models: the control variable is independent of the covariate variable, the control variable has the effect on the covariate variable and the covariate variable affects the control variable. Using the ancillary information based on conditional independence hypotheses, the sufficient conditions...

  4. Multiple regression and beyond an introduction to multiple regression and structural equation modeling

    CERN Document Server

    Keith, Timothy Z

    2014-01-01

    Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.

  5. Tutorial on Using Regression Models with Count Outcomes Using R

    Directory of Open Access Journals (Sweden)

    A. Alexander Beaujean

    2016-02-01

    Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.

  6. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both

  7. Correlation-regression model for physico-chemical quality of ...

    African Journals Online (AJOL)

    abusaad

    areas, suggesting that groundwater quality in urban areas is closely related with land use ... the ground water, with correlation and regression model is also presented. ...... WHO (World Health Organization) (1985). Health hazards from nitrates.

  8. Binary model for the coma cluster of galaxies

    International Nuclear Information System (INIS)

    Valtonen, M.J.; Byrd, G.G.

    1979-01-01

    We study the dynamics of galaxies in the Coma cluster and find that the cluster is probably dominated by a central binary of galaxies NGC 4874--NGC4889. We estimate their total mass to be about 3 x 10 14 M/sub sun/ by two independent methods (assuming in Hubble constant of 100 km s -1 Mpc -1 ). This binary is efficient in dynamically ejecting smaller galaxies, some of of which are seen in projection against the inner 3 0 radius of the cluster and which, if erroneously considered as bound members, cause a serious overestimate of the mass of the entire cluster. Taking account of the ejected galaxies, we estimate the total cluster mass to be 4--9 x 10 14 M/sub sun/, with a corresponding mass-to-light ratio for a typical galaxy in the range of 20--120 solar units. The origin of the secondary maximum observed in the radial surface density profile is studied. We consider it to be a remnant of a shell of galaxies which formed around the central binary. This shell expanded, then collapsed into the binary, and is now reexpanding. This is supported by the coincidence of the minimum in the cluster eccentricity and radical velocity dispersion at the same radial distance as the secondary maximum. Numerical simulations of a cluster model with a massive central binary and a spherical shell of test particles are performed, and they reproduce the observed shape, galaxy density, and radial velocity distributions in the Coma cluster fairly well. Consequences of extending the model to other clusters are discussed

  9. The Binary System Laboratory Activities Based on Students Mental Model

    Science.gov (United States)

    Albaiti, A.; Liliasari, S.; Sumarna, O.; Martoprawiro, M. A.

    2017-09-01

    Generic science skills (GSS) are required to develop student conception in learning binary system. The aim of this research was to know the improvement of students GSS through the binary system labotoratory activities based on their mental model using hypothetical-deductive learning cycle. It was a mixed methods embedded experimental model research design. This research involved 15 students of a university in Papua, Indonesia. Essay test of 7 items was used to analyze the improvement of students GSS. Each items was designed to interconnect macroscopic, sub-microscopic and symbolic levels. Students worksheet was used to explore students mental model during investigation in laboratory. The increase of students GSS could be seen in their N-Gain of each GSS indicators. The results were then analyzed descriptively. Students mental model and GSS have been improved from this study. They were interconnect macroscopic and symbolic levels to explain binary systems phenomena. Furthermore, they reconstructed their mental model with interconnecting the three levels of representation in Physical Chemistry. It necessary to integrate the Physical Chemistry Laboratory into a Physical Chemistry course for effectiveness and efficiency.

  10. Complete waveform model for compact binaries on eccentric orbits

    Science.gov (United States)

    Huerta, E. A.; Kumar, Prayush; Agarwal, Bhanu; George, Daniel; Schive, Hsi-Yu; Pfeiffer, Harald P.; Haas, Roland; Ren, Wei; Chu, Tony; Boyle, Michael; Hemberger, Daniel A.; Kidder, Lawrence E.; Scheel, Mark A.; Szilagyi, Bela

    2017-01-01

    We present a time domain waveform model that describes the inspiral, merger and ringdown of compact binary systems whose components are nonspinning, and which evolve on orbits with low to moderate eccentricity. The inspiral evolution is described using third-order post-Newtonian equations both for the equations of motion of the binary, and its far-zone radiation field. This latter component also includes instantaneous, tails and tails-of-tails contributions, and a contribution due to nonlinear memory. This framework reduces to the post-Newtonian approximant TaylorT4 at third post-Newtonian order in the zero-eccentricity limit. To improve phase accuracy, we also incorporate higher-order post-Newtonian corrections for the energy flux of quasicircular binaries and gravitational self-force corrections to the binding energy of compact binaries. This enhanced prescription for the inspiral evolution is combined with a fully analytical prescription for the merger-ringdown evolution constructed using a catalog of numerical relativity simulations. We show that this inspiral-merger-ringdown waveform model reproduces the effective-one-body model of Ref. [Y. Pan et al., Phys. Rev. D 89, 061501 (2014)., 10.1103/PhysRevD.89.061501] for quasicircular black hole binaries with mass ratios between 1 to 15 in the zero-eccentricity limit over a wide range of the parameter space under consideration. Using a set of eccentric numerical relativity simulations, not used during calibration, we show that our new eccentric model reproduces the true features of eccentric compact binary coalescence throughout merger. We use this model to show that the gravitational-wave transients GW150914 and GW151226 can be effectively recovered with template banks of quasicircular, spin-aligned waveforms if the eccentricity e0 of these systems when they enter the aLIGO band at a gravitational-wave frequency of 14 Hz satisfies e0GW 150914≤0.15 and e0GW 151226≤0.1 . We also find that varying the spin

  11. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  12. Real estate value prediction using multivariate regression models

    Science.gov (United States)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  13. Application of random regression models to the genetic evaluation ...

    African Journals Online (AJOL)

    The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...

  14. The APT model as reduced-rank regression

    NARCIS (Netherlands)

    Bekker, P.A.; Dobbelstein, P.; Wansbeek, T.J.

    Integrating the two steps of an arbitrage pricing theory (APT) model leads to a reduced-rank regression (RRR) model. So the results on RRR can be used to estimate APT models, making estimation very simple. We give a succinct derivation of estimation of RRR, derive the asymptotic variance of RRR

  15. Models for the formation of binary and millisecond radio pulsars

    International Nuclear Information System (INIS)

    van den Heuvel, E.P.J.

    1984-01-01

    The peculiar combination of a relatively short pulse period and a relatively weak surface dipole magnetic field strength of binary radio pulsars finds a consistent explanation in terms of: (i) decay of the surface dipole component of neutron star magnetic fields on a timescale of (2-5).10 6 yrs, in combination with: (ii) spin up of the rotation of the neutron star during a subsequent mass-transfer phase. The two observed classes of binary radio pulsars (very close and very wide systems, respectively) are expected to have been formed by the later evolution of binaries consisting of a neutron star and a normal companion star, in which the companion was (considerably) more massive than the neutron star, or less massive than the neutron star, respectively. In the first case the companion of the neutron star in the final system will be a fairly massive white dwarf, in a circular orbit, or a neutron star in an eccentric orbit. In the second case the final companion to the neutron star will be a low-mass (approx. 0.3 Msub solar) helium white dwarf in a wide and nearly circular orbit. In systems of the second type the neutron star was most probably formed by the accretion-induced collapse of a white dwarf. This explains why PSR 1953+29 has a millisecond rotation period and why PSR 0820+02 has not. Binary coalescence models for the formation of the 1.5 millisecond pulsar appear to be viable. The companion to the neutron star may have been a low-mass red dwarf, a neutron star, or a massive (> 0.7 Msub solar) white dwarf. In the red-dwarf case the progenitor system probably was a CV binary in which the white dwarf collapsed by accretion. 66 references, 6 figures, 1 table

  16. Percolation of binary disk systems: Modeling and theory

    International Nuclear Information System (INIS)

    Meeks, Kelsey; Pantoya, Michelle L.

    2017-01-01

    The dispersion and connectivity of particles with a high degree of polydispersity is relevant to problems involving composite material properties and reaction decomposition prediction and has been the subject of much study in the literature. This paper utilizes Monte Carlo models to predict percolation thresholds for a two-dimensional systems containing disks of two different radii. Monte Carlo simulations and spanning probability are used to extend prior models into regions of higher polydispersity than those previously considered. A correlation to predict the percolation threshold for binary disk systems is proposed based on the extended dataset presented in this work and compared to previously published correlations. Finally, a set of boundary conditions necessary for a good fit is presented, and a condition for maximizing percolation threshold for binary disk systems is suggested.

  17. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  18. Alternative regression models to assess increase in childhood BMI.

    Science.gov (United States)

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-09-08

    Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  19. Robust mislabel logistic regression without modeling mislabel probabilities.

    Science.gov (United States)

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  20. Simple model of surface roughness for binary collision sputtering simulations

    Energy Technology Data Exchange (ETDEWEB)

    Lindsey, Sloan J. [Institute of Solid-State Electronics, TU Wien, Floragasse 7, A-1040 Wien (Austria); Hobler, Gerhard, E-mail: gerhard.hobler@tuwien.ac.at [Institute of Solid-State Electronics, TU Wien, Floragasse 7, A-1040 Wien (Austria); Maciążek, Dawid; Postawa, Zbigniew [Institute of Physics, Jagiellonian University, ul. Lojasiewicza 11, 30348 Kraków (Poland)

    2017-02-15

    Highlights: • A simple model of surface roughness is proposed. • Its key feature is a linearly varying target density at the surface. • The model can be used in 1D/2D/3D Monte Carlo binary collision simulations. • The model fits well experimental glancing incidence sputtering yield data. - Abstract: It has been shown that surface roughness can strongly influence the sputtering yield – especially at glancing incidence angles where the inclusion of surface roughness leads to an increase in sputtering yields. In this work, we propose a simple one-parameter model (the “density gradient model”) which imitates surface roughness effects. In the model, the target’s atomic density is assumed to vary linearly between the actual material density and zero. The layer width is the sole model parameter. The model has been implemented in the binary collision simulator IMSIL and has been evaluated against various geometric surface models for 5 keV Ga ions impinging an amorphous Si target. To aid the construction of a realistic rough surface topography, we have performed MD simulations of sequential 5 keV Ga impacts on an initially crystalline Si target. We show that our new model effectively reproduces the sputtering yield, with only minor variations in the energy and angular distributions of sputtered particles. The success of the density gradient model is attributed to a reduction of the reflection coefficient – leading to increased sputtering yields, similar in effect to surface roughness.

  1. Simple model of surface roughness for binary collision sputtering simulations

    International Nuclear Information System (INIS)

    Lindsey, Sloan J.; Hobler, Gerhard; Maciążek, Dawid; Postawa, Zbigniew

    2017-01-01

    Highlights: • A simple model of surface roughness is proposed. • Its key feature is a linearly varying target density at the surface. • The model can be used in 1D/2D/3D Monte Carlo binary collision simulations. • The model fits well experimental glancing incidence sputtering yield data. - Abstract: It has been shown that surface roughness can strongly influence the sputtering yield – especially at glancing incidence angles where the inclusion of surface roughness leads to an increase in sputtering yields. In this work, we propose a simple one-parameter model (the “density gradient model”) which imitates surface roughness effects. In the model, the target’s atomic density is assumed to vary linearly between the actual material density and zero. The layer width is the sole model parameter. The model has been implemented in the binary collision simulator IMSIL and has been evaluated against various geometric surface models for 5 keV Ga ions impinging an amorphous Si target. To aid the construction of a realistic rough surface topography, we have performed MD simulations of sequential 5 keV Ga impacts on an initially crystalline Si target. We show that our new model effectively reproduces the sputtering yield, with only minor variations in the energy and angular distributions of sputtered particles. The success of the density gradient model is attributed to a reduction of the reflection coefficient – leading to increased sputtering yields, similar in effect to surface roughness.

  2. Modeling binary correlated responses using SAS, SPSS and R

    CERN Document Server

    Wilson, Jeffrey R

    2015-01-01

    Statistical tools to analyze correlated binary data are spread out in the existing literature. This book makes these tools accessible to practitioners in a single volume. Chapters cover recently developed statistical tools and statistical packages that are tailored to analyzing correlated binary data. The authors showcase both traditional and new methods for application to health-related research. Data and computer programs will be publicly available in order for readers to replicate model development, but learning a new statistical language is not necessary with this book. The inclusion of code for R, SAS, and SPSS allows for easy implementation by readers. For readers interested in learning more about the languages, though, there are short tutorials in the appendix. Accompanying data sets are available for download through the book s website. Data analysis presented in each chapter will provide step-by-step instructions so these new methods can be readily applied to projects.  Researchers and graduate stu...

  3. A model for the massive binary V340 Muscae

    Science.gov (United States)

    Hauck, Norbert

    2016-02-01

    A synthetic light curve has been fitted to photometric data from the ASAS-3 database. The parameters of the best solution are well consistent with those derived from stellar models for both components for an initial metallicity Z=0.020 and a common age of 5 Myr. Therefore, we can reliably estimate the absolute dimensions of this close eclipsing binary system. Apparently, the O-type primary star has a mass of about 22.65 Msun and a radius of 10.35 Rsun. For the secondary star, likely a late B-type dwarf, we obtain about 3.1 Msun and 2.1 Rsun. Their mass ratio of about 0.138 might be the lowest found so far in O-type binaries. [English and German online-version of this paper available under www.bav-astro.eu/rb/rb2016-2/1.html].

  4. Linear regression models for quantitative assessment of left ...

    African Journals Online (AJOL)

    Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...

  5. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  6. Physics constrained nonlinear regression models for time series

    International Nuclear Information System (INIS)

    Majda, Andrew J; Harlim, John

    2013-01-01

    A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)

  7. Marginal and Random Intercepts Models for Longitudinal Binary Data with Examples from Criminology

    Science.gov (United States)

    Long, Jeffrey D.; Loeber, Rolf; Farrington, David P.

    2009-01-01

    Two models for the analysis of longitudinal binary data are discussed: the marginal model and the random intercepts model. In contrast to the linear mixed model (LMM), the two models for binary data are not subsumed under a single hierarchical model. The marginal model provides group-level information whereas the random intercepts model provides…

  8. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  9. Binary versus non-binary information in real time series: empirical results and maximum-entropy matrix models

    Science.gov (United States)

    Almog, Assaf; Garlaschelli, Diego

    2014-09-01

    The dynamics of complex systems, from financial markets to the brain, can be monitored in terms of multiple time series of activity of the constituent units, such as stocks or neurons, respectively. While the main focus of time series analysis is on the magnitude of temporal increments, a significant piece of information is encoded into the binary projection (i.e. the sign) of such increments. In this paper we provide further evidence of this by showing strong nonlinear relations between binary and non-binary properties of financial time series. These relations are a novel quantification of the fact that extreme price increments occur more often when most stocks move in the same direction. We then introduce an information-theoretic approach to the analysis of the binary signature of single and multiple time series. Through the definition of maximum-entropy ensembles of binary matrices and their mapping to spin models in statistical physics, we quantify the information encoded into the simplest binary properties of real time series and identify the most informative property given a set of measurements. Our formalism is able to accurately replicate, and mathematically characterize, the observed binary/non-binary relations. We also obtain a phase diagram allowing us to identify, based only on the instantaneous aggregate return of a set of multiple time series, a regime where the so-called ‘market mode’ has an optimal interpretation in terms of collective (endogenous) effects, a regime where it is parsimoniously explained by pure noise, and a regime where it can be regarded as a combination of endogenous and exogenous factors. Our approach allows us to connect spin models, simple stochastic processes, and ensembles of time series inferred from partial information.

  10. Binary versus non-binary information in real time series: empirical results and maximum-entropy matrix models

    International Nuclear Information System (INIS)

    Almog, Assaf; Garlaschelli, Diego

    2014-01-01

    The dynamics of complex systems, from financial markets to the brain, can be monitored in terms of multiple time series of activity of the constituent units, such as stocks or neurons, respectively. While the main focus of time series analysis is on the magnitude of temporal increments, a significant piece of information is encoded into the binary projection (i.e. the sign) of such increments. In this paper we provide further evidence of this by showing strong nonlinear relations between binary and non-binary properties of financial time series. These relations are a novel quantification of the fact that extreme price increments occur more often when most stocks move in the same direction. We then introduce an information-theoretic approach to the analysis of the binary signature of single and multiple time series. Through the definition of maximum-entropy ensembles of binary matrices and their mapping to spin models in statistical physics, we quantify the information encoded into the simplest binary properties of real time series and identify the most informative property given a set of measurements. Our formalism is able to accurately replicate, and mathematically characterize, the observed binary/non-binary relations. We also obtain a phase diagram allowing us to identify, based only on the instantaneous aggregate return of a set of multiple time series, a regime where the so-called ‘market mode’ has an optimal interpretation in terms of collective (endogenous) effects, a regime where it is parsimoniously explained by pure noise, and a regime where it can be regarded as a combination of endogenous and exogenous factors. Our approach allows us to connect spin models, simple stochastic processes, and ensembles of time series inferred from partial information. (paper)

  11. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    Science.gov (United States)

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  12. Performance of models for estimating absolute risk difference in multicenter trials with binary outcome

    Directory of Open Access Journals (Sweden)

    Claudia Pedroza

    2016-08-01

    Full Text Available Abstract Background Reporting of absolute risk difference (RD is recommended for clinical and epidemiological prospective studies. In analyses of multicenter studies, adjustment for center is necessary when randomization is stratified by center or when there is large variation in patients outcomes across centers. While regression methods are used to estimate RD adjusted for baseline predictors and clustering, no formal evaluation of their performance has been previously conducted. Methods We performed a simulation study to evaluate 6 regression methods fitted under a generalized estimating equation framework: binomial identity, Poisson identity, Normal identity, log binomial, log Poisson, and logistic regression model. We compared the model estimates to unadjusted estimates. We varied the true response function (identity or log, number of subjects per center, true risk difference, control outcome rate, effect of baseline predictor, and intracenter correlation. We compared the models in terms of convergence, absolute bias and coverage of 95 % confidence intervals for RD. Results The 6 models performed very similar to each other for the majority of scenarios. However, the log binomial model did not converge for a large portion of the scenarios including a baseline predictor. In scenarios with outcome rate close to the parameter boundary, the binomial and Poisson identity models had the best performance, but differences from other models were negligible. The unadjusted method introduced little bias to the RD estimates, but its coverage was larger than the nominal value in some scenarios with an identity response. Under the log response, coverage from the unadjusted method was well below the nominal value (<80 % for some scenarios. Conclusions We recommend the use of a binomial or Poisson GEE model with identity link to estimate RD for correlated binary outcome data. If these models fail to run, then either a logistic regression, log Poisson

  13. Forecasting daily meteorological time series using ARIMA and regression models

    Science.gov (United States)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  14. Multiple Response Regression for Gaussian Mixture Models with Known Labels.

    Science.gov (United States)

    Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng

    2012-12-01

    Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.

  15. Thermal Efficiency Degradation Diagnosis Method Using Regression Model

    International Nuclear Information System (INIS)

    Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

    2011-01-01

    This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant

  16. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  17. Flexible competing risks regression modeling and goodness-of-fit

    DEFF Research Database (Denmark)

    Scheike, Thomas; Zhang, Mei-Jie

    2008-01-01

    In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...... of the flexible regression models to analyze competing risks data when non-proportionality is present in the data....

  18. The art of regression modeling in road safety

    CERN Document Server

    Hauer, Ezra

    2015-01-01

    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  19. Model building strategy for logistic regression: purposeful selection.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  20. Regression analysis of a chemical reaction fouling model

    International Nuclear Information System (INIS)

    Vasak, F.; Epstein, N.

    1996-01-01

    A previously reported mathematical model for the initial chemical reaction fouling of a heated tube is critically examined in the light of the experimental data for which it was developed. A regression analysis of the model with respect to that data shows that the reference point upon which the two adjustable parameters of the model were originally based was well chosen, albeit fortuitously. (author). 3 refs., 2 tabs., 2 figs

  1. Spatial stochastic regression modelling of urban land use

    International Nuclear Information System (INIS)

    Arshad, S H M; Jaafar, J; Abiden, M Z Z; Latif, Z A; Rasam, A R A

    2014-01-01

    Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable

  2. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  3. Logistic regression for risk factor modelling in stuttering research.

    Science.gov (United States)

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    Science.gov (United States)

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  5. Modeling and prediction of flotation performance using support vector regression

    Directory of Open Access Journals (Sweden)

    Despotović Vladimir

    2017-01-01

    Full Text Available Continuous efforts have been made in recent year to improve the process of paper recycling, as it is of critical importance for saving the wood, water and energy resources. Flotation deinking is considered to be one of the key methods for separation of ink particles from the cellulose fibres. Attempts to model the flotation deinking process have often resulted in complex models that are difficult to implement and use. In this paper a model for prediction of flotation performance based on Support Vector Regression (SVR, is presented. Representative data samples were created in laboratory, under a variety of practical control variables for the flotation deinking process, including different reagents, pH values and flotation residence time. Predictive model was created that was trained on these data samples, and the flotation performance was assessed showing that Support Vector Regression is a promising method even when dataset used for training the model is limited.

  6. Bayesian approach to errors-in-variables in regression models

    Science.gov (United States)

    Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad

    2017-05-01

    In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.

  7. The mechanical properties of high speed GTAW weld and factors of nonlinear multiple regression model under external transverse magnetic field

    Science.gov (United States)

    Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou

    2013-05-01

    A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.

  8. Time series regression model for infectious disease and weather.

    Science.gov (United States)

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Variable Selection for Regression Models of Percentile Flows

    Science.gov (United States)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high

  10. Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    Teräsvirta, Timo; Yang, Yukai

    The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...

  11. Application of multilinear regression analysis in modeling of soil ...

    African Journals Online (AJOL)

    The application of Multi-Linear Regression Analysis (MLRA) model for predicting soil properties in Calabar South offers a technical guide and solution in foundation designs problems in the area. Forty-five soil samples were collected from fifteen different boreholes at a different depth and 270 tests were carried out for CBR, ...

  12. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2009-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  13. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2010-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  14. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2011-01-01

    In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator

  15. Transpiration of glasshouse rose crops: evaluation of regression models

    NARCIS (Netherlands)

    Baas, R.; Rijssel, van E.

    2006-01-01

    Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a

  16. Structural classification and a binary structure model for superconductors

    Institute of Scientific and Technical Information of China (English)

    Dong Cheng

    2006-01-01

    Based on structural and bonding features, a new classification scheme of superconductors is proposed to classify conductors can be partitioned into two parts, a superconducting active component and a supplementary component.Partially metallic covalent bonding is found to be a common feature in all superconducting active components, and the electron states of the atoms in the active components usually make a dominant contribution to the energy band near the Fermi surface. Possible directions to explore new superconductors are discussed based on the structural classification and the binary structure model.

  17. Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

    Science.gov (United States)

    Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

    2005-09-01

    To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.

  18. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  19. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    DEFF Research Database (Denmark)

    Dyrholm, Mads; Hansen, Lars Kai

    2004-01-01

    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least square...... estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording....

  20. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  1. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Directory of Open Access Journals (Sweden)

    Minh Vu Trieu

    2017-03-01

    Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  2. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Science.gov (United States)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  3. Detection of Outliers in Regression Model for Medical Data

    Directory of Open Access Journals (Sweden)

    Stephen Raj S

    2017-07-01

    Full Text Available In regression analysis, an outlier is an observation for which the residual is large in magnitude compared to other observations in the data set. The detection of outliers and influential points is an important step of the regression analysis. Outlier detection methods have been used to detect and remove anomalous values from data. In this paper, we detect the presence of outliers in simple linear regression models for medical data set. Chatterjee and Hadi mentioned that the ordinary residuals are not appropriate for diagnostic purposes; a transformed version of them is preferable. First, we investigate the presence of outliers based on existing procedures of residuals and standardized residuals. Next, we have used the new approach of standardized scores for detecting outliers without the use of predicted values. The performance of the new approach was verified with the real-life data.

  4. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

    Science.gov (United States)

    Kawashima, Issaku; Kumano, Hiroaki

    2017-01-01

    Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  5. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Issaku Kawashima

    2017-07-01

    Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  6. Hierarchical Neural Regression Models for Customer Churn Prediction

    Directory of Open Access Journals (Sweden)

    Golshan Mohammadi

    2013-01-01

    Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.

  7. Electricity consumption forecasting in Italy using linear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio [DIAM, Seconda Universita degli Studi di Napoli, Via Roma 29, 81031 Aversa (CE) (Italy)

    2009-09-15

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of {+-}1% for the best case and {+-}11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  8. Electricity consumption forecasting in Italy using linear regression models

    International Nuclear Information System (INIS)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio

    2009-01-01

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  9. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  10. Modeling adsorption of binary and ternary mixtures on microporous media

    DEFF Research Database (Denmark)

    Monsalvo, Matias Alfonso; Shapiro, Alexander

    2007-01-01

    it possible using the same equation of state to describe the thermodynamic properties of the segregated and the bulk phases. For comparison, we also used the ideal adsorbed solution theory (IAST) to describe adsorption equilibria. The main advantage of these two models is their capabilities to predict......The goal of this work is to analyze the adsorption of binary and ternary mixtures on the basis of the multicomponent potential theory of adsorption (MPTA). In the MPTA, the adsorbate is considered as a segregated mixture in the external potential field emitted by the solid adsorbent. This makes...... multicomponent adsorption equilibria on the basis of single-component adsorption data. We compare the MPTA and IAST models to a large set of experimental data, obtaining reasonable good agreement with experimental data and high degree of predictability. Some limitations of both models are also discussed....

  11. Two-step variable selection in quantile regression models

    Directory of Open Access Journals (Sweden)

    FAN Yali

    2015-06-01

    Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions, in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform ℓ1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.

  12. New robust statistical procedures for the polytomous logistic regression models.

    Science.gov (United States)

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  13. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE

    OpenAIRE

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-01-01

    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran?s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran?s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For pr...

  14. Online Statistical Modeling (Regression Analysis) for Independent Responses

    Science.gov (United States)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  15. Reconstruction of missing daily streamflow data using dynamic regression models

    Science.gov (United States)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  16. Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...

  17. Extended cox regression model: The choice of timefunction

    Science.gov (United States)

    Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu

    2017-07-01

    Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.

  18. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

    Science.gov (United States)

    Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

    2013-02-01

    Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local

  19. A test of inflated zeros for Poisson regression models.

    Science.gov (United States)

    He, Hua; Zhang, Hui; Ye, Peng; Tang, Wan

    2017-01-01

    Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.

  20. Simple model of surface roughness for binary collision sputtering simulations

    Science.gov (United States)

    Lindsey, Sloan J.; Hobler, Gerhard; Maciążek, Dawid; Postawa, Zbigniew

    2017-02-01

    It has been shown that surface roughness can strongly influence the sputtering yield - especially at glancing incidence angles where the inclusion of surface roughness leads to an increase in sputtering yields. In this work, we propose a simple one-parameter model (the "density gradient model") which imitates surface roughness effects. In the model, the target's atomic density is assumed to vary linearly between the actual material density and zero. The layer width is the sole model parameter. The model has been implemented in the binary collision simulator IMSIL and has been evaluated against various geometric surface models for 5 keV Ga ions impinging an amorphous Si target. To aid the construction of a realistic rough surface topography, we have performed MD simulations of sequential 5 keV Ga impacts on an initially crystalline Si target. We show that our new model effectively reproduces the sputtering yield, with only minor variations in the energy and angular distributions of sputtered particles. The success of the density gradient model is attributed to a reduction of the reflection coefficient - leading to increased sputtering yields, similar in effect to surface roughness.

  1. The role of multicollinearity in landslide susceptibility assessment by means of Binary Logistic Regression: comparison between VIF and AIC stepwise selection

    Science.gov (United States)

    Cama, Mariaelena; Cristi Nicu, Ionut; Conoscenti, Christian; Quénéhervé, Geraldine; Maerker, Michael

    2016-04-01

    Landslide susceptibility can be defined as the likelihood of a landslide occurring in a given area on the basis of local terrain conditions. In the last decades many research focused on its evaluation by means of stochastic approaches under the assumption that 'the past is the key to the future' which means that if a model is able to reproduce a known landslide spatial distribution, it will be able to predict the future locations of new (i.e. unknown) slope failures. Among the various stochastic approaches, Binary Logistic Regression (BLR) is one of the most used because it calculates the susceptibility in probabilistic terms and its results are easily interpretable from a geomorphological point of view. However, very often not much importance is given to multicollinearity assessment whose effect is that the coefficient estimates are unstable, with opposite sign and therefore difficult to interpret. Therefore, it should be evaluated every time in order to make a model whose results are geomorphologically correct. In this study the effects of multicollinearity in the predictive performance and robustness of landslide susceptibility models are analyzed. In particular, the multicollinearity is estimated by means of Variation Inflation Index (VIF) which is also used as selection criterion for the independent variables (VIF Stepwise Selection) and compared to the more commonly used AIC Stepwise Selection. The robustness of the results is evaluated through 100 replicates of the dataset. The study area selected to perform this analysis is the Moldavian Plateau where landslides are among the most frequent geomorphological processes. This area has an increasing trend of urbanization and a very high potential regarding the cultural heritage, being the place of discovery of the largest settlement belonging to the Cucuteni Culture from Eastern Europe (that led to the development of the great complex Cucuteni-Tripyllia). Therefore, identifying the areas susceptible to

  2. Multivariate Frequency-Severity Regression Models in Insurance

    Directory of Open Access Journals (Sweden)

    Edward W. Frees

    2016-02-01

    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  3. Augmented Beta rectangular regression models: A Bayesian perspective.

    Science.gov (United States)

    Wang, Jue; Luo, Sheng

    2016-01-01

    Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Bayesian semiparametric regression models to characterize molecular evolution

    Directory of Open Access Journals (Sweden)

    Datta Saheli

    2012-10-01

    Full Text Available Abstract Background Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments. Results The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties. Conclusions The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.

  5. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  6. Modeling the number of car theft using Poisson regression

    Science.gov (United States)

    Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura

    2016-10-01

    Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.

  7. Thermodynamic modeling of saturated liquid compositions and densities for asymmetric binary systems composed of carbon dioxide, alkanes and alkanols

    International Nuclear Information System (INIS)

    Bayestehparvin, Bita; Nourozieh, Hossein; Kariznovi, Mohammad; Abedi, Jalal

    2015-01-01

    Highlights: • Phase behavior of the binary systems containing largely different components. • Equation of state modeling of binary polar and non-polar systems by utilizing different mixing rules. • Three different mixing rules (one-parameter, two-parameters and Wong–Sandler) coupled with Peng–Robinson equation of state. • Two-parameter mixing rule shows promoting results compared to one-parameter mixing rule. • Wong–Sandler mixing rule is unable to predict saturated liquid densities with sufficient accuracy. - Abstract: The present study mainly focuses on the phase behavior modeling of asymmetric binary mixtures. Capability of different mixing rules and volume shift in the prediction of solubility and saturated liquid density has been investigated. Different binary systems of (alkane + alkanol), (alkane + alkane), (carbon dioxide + alkanol), and (carbon dioxide + alkane) are considered. The composition and the density of saturated liquid phase at equilibrium condition are the properties of interest. Considering composition and saturated liquid density of different binary systems, three main objectives are investigated. First, three different mixing rules (one-parameter, two parameters and Wong–Sandler) coupled with Peng–Robinson equation of state were used to predict the equilibrium properties. The Wong–Sandler mixing rule was utilized with the non-random two-liquid (NRTL) model. Binary interaction coefficients and NRTL model parameters were optimized using the Levenberg–Marquardt algorithm. Second, to improve the density prediction, the volume translation technique was applied. Finally, Two different approaches were considered to tune the equation of state; regression of experimental equilibrium compositions and densities separately and spontaneously. The modeling results show that there is no superior mixing rule which can predict the equilibrium properties for different systems. Two-parameter and Wong–Sandler mixing rule show promoting

  8. Dynamic Regression Intervention Modeling for the Malaysian Daily Load

    Directory of Open Access Journals (Sweden)

    Fadhilah Abdrazak

    2014-05-01

    Full Text Available Malaysia is a unique country due to having both fixed and moving holidays.  These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors.  If these effects can be estimated and removed, the behavior of the series could be better viewed.  Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis.   Based on the linear transfer function method, a daily load model consists of either peak or average is developed.  The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.

  9. Learning Supervised Topic Models for Classification and Regression from Crowds.

    Science.gov (United States)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C

    2017-12-01

    The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.

  10. Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes.

    Science.gov (United States)

    Cook, James P; Mahajan, Anubha; Morris, Andrew P

    2017-02-01

    Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.

  11. Continuous validation of ASTEC containment models and regression testing

    International Nuclear Information System (INIS)

    Nowack, Holger; Reinke, Nils; Sonnenkalb, Martin

    2014-01-01

    The focus of the ASTEC (Accident Source Term Evaluation Code) development at GRS is primarily on the containment module CPA (Containment Part of ASTEC), whose modelling is to a large extent based on the GRS containment code COCOSYS (COntainment COde SYStem). Validation is usually understood as the approval of the modelling capabilities by calculations of appropriate experiments done by external users different from the code developers. During the development process of ASTEC CPA, bugs and unintended side effects may occur, which leads to changes in the results of the initially conducted validation. Due to the involvement of a considerable number of developers in the coding of ASTEC modules, validation of the code alone, even if executed repeatedly, is not sufficient. Therefore, a regression testing procedure has been implemented in order to ensure that the initially obtained validation results are still valid with succeeding code versions. Within the regression testing procedure, calculations of experiments and plant sequences are performed with the same input deck but applying two different code versions. For every test-case the up-to-date code version is compared to the preceding one on the basis of physical parameters deemed to be characteristic for the test-case under consideration. In the case of post-calculations of experiments also a comparison to experimental data is carried out. Three validation cases from the regression testing procedure are presented within this paper. The very good post-calculation of the HDR E11.1 experiment shows the high quality modelling of thermal-hydraulics in ASTEC CPA. Aerosol behaviour is validated on the BMC VANAM M3 experiment, and the results show also a very good agreement with experimental data. Finally, iodine behaviour is checked in the validation test-case of the THAI IOD-11 experiment. Within this test-case, the comparison of the ASTEC versions V2.0r1 and V2.0r2 shows how an error was detected by the regression testing

  12. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Directory of Open Access Journals (Sweden)

    Campos-Aranda Daniel Francisco

    2014-10-01

    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  13. Genetic evaluation of European quails by random regression models

    Directory of Open Access Journals (Sweden)

    Flaviana Miranda Gonçalves

    2012-09-01

    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  14. Numerical model for dendritic solidification of binary alloys

    Science.gov (United States)

    Felicelli, S. D.; Heinrich, J. C.; Poirier, D. R.

    1993-01-01

    A finite element model capable of simulating solidification of binary alloys and the formation of freckles is presented. It uses a single system of equations to deal with the all-liquid region, the dendritic region, and the all-solid region. The dendritic region is treated as an anisotropic porous medium. The algorithm uses the bilinear isoparametric element, with a penalty function approximation and a Petrov-Galerkin formulation. Numerical simulations are shown in which an NH4Cl-H2O mixture and a Pb-Sn alloy melt are cooled. The solidification process is followed in time. Instabilities in the process can be clearly observed and the final compositions obtained.

  15. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  16. Learning Supervised Topic Models for Classification and Regression from Crowds

    DEFF Research Database (Denmark)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete

    2017-01-01

    problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...... annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression...

  17. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...

  18. Predicting Performance on MOOC Assessments using Multi-Regression Models

    OpenAIRE

    Ren, Zhiyun; Rangwala, Huzefa; Johri, Aditya

    2016-01-01

    The past few years has seen the rapid growth of data min- ing approaches for the analysis of data obtained from Mas- sive Open Online Courses (MOOCs). The objectives of this study are to develop approaches to predict the scores a stu- dent may achieve on a given grade-related assessment based on information, considered as prior performance or prior ac- tivity in the course. We develop a personalized linear mul- tiple regression (PLMR) model to predict the grade for a student, prior to attempt...

  19. Modeling of columnar and equiaxed solidification of binary mixtures

    International Nuclear Information System (INIS)

    Roux, P.

    2005-12-01

    This work deals with the modelling of dendritic solidification in binary mixtures. Large scale phenomena are represented by volume averaging of the local conservation equations. This method allows to rigorously derive the partial differential equations of averaged fields and the closure problems associated to the deviations. Such problems can be resolved numerically on periodic cells, representative of dendritic structures, in order to give a precise evaluation of macroscopic transfer coefficients (Drag coefficients, exchange coefficients, diffusion-dispersion tensors...). The method had already been applied for a model of columnar dendritic mushy zone and it is extended to the case of equiaxed dendritic solidification, where solid grains can move. The two-phase flow is modelled with an Eulerian-Eulerian approach and the novelty is to account for the dispersion of solid velocity through the kinetic agitation of the particles. A coupling of the two models is proposed thanks to an original adaptation of the columnar model, allowing for undercooling calculation: a solid-liquid interfacial area density is introduced and calculated. At last, direct numerical simulations of crystal growth are proposed with a diffuse interface method for a representation of local phenomena. (author)

  20. Analytical and regression models of glass rod drawing process

    Science.gov (United States)

    Alekseeva, L. B.

    2018-03-01

    The process of drawing glass rods (light guides) is being studied. The parameters of the process affecting the quality of the light guide have been determined. To solve the problem, mathematical models based on general equations of continuum mechanics are used. The conditions for the stable flow of the drawing process have been found, which are determined by the stability of the motion of the glass mass in the formation zone to small uncontrolled perturbations. The sensitivity of the formation zone to perturbations of the drawing speed and viscosity is estimated. Experimental models of the drawing process, based on the regression analysis methods, have been obtained. These models make it possible to customize a specific production process to obtain light guides of the required quality. They allow one to find the optimum combination of process parameters in the chosen area and to determine the required accuracy of maintaining them at a specified level.

  1. Modeling of formation of binary-phase hollow nanospheres from metallic solid nanospheres

    International Nuclear Information System (INIS)

    Svoboda, J.; Fischer, F.D.; Vollath, D.

    2009-01-01

    Spontaneous formation of binary-phase hollow nanospheres by reaction of a metallic nanosphere with a non-metallic component in the surrounding atmosphere is observed for many systems. The kinetic model describing this phenomenon is derived by application of the thermodynamic extremal principle. The necessary condition of formation of the binary-phase hollow nanospheres is that the diffusion coefficient of the metallic component in the binary phase is higher than that of the non-metallic component (Kirkendall effect occurs in the correct direction). The model predictions of the time to formation of the binary-phase hollow nanospheres agree with the experimental observations

  2. A joint logistic regression and covariate-adjusted continuous-time Markov chain model.

    Science.gov (United States)

    Rubin, Maria Laura; Chan, Wenyaw; Yamal, Jose-Miguel; Robertson, Claudia Sue

    2017-12-10

    The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross-sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6-month outcome based on physiological data collected post-injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long-term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  3. Regression Models for Predicting Force Coefficients of Aerofoils

    Directory of Open Access Journals (Sweden)

    Mohammed ABDUL AKBAR

    2015-09-01

    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  4. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    Science.gov (United States)

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected

  5. Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks

    Science.gov (United States)

    Kanevski, Mikhail

    2015-04-01

    The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press

  6. Conditional Monte Carlo randomization tests for regression models.

    Science.gov (United States)

    Parhat, Parwen; Rosenberger, William F; Diao, Guoqing

    2014-08-15

    We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  8. Global Land Use Regression Model for Nitrogen Dioxide Air Pollution.

    Science.gov (United States)

    Larkin, Andrew; Geddes, Jeffrey A; Martin, Randall V; Xiao, Qingyang; Liu, Yang; Marshall, Julian D; Brauer, Michael; Hystad, Perry

    2017-06-20

    Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the worldwide distribution of NO 2 exposure and associated impacts on health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO 2 ) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO 2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R 2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n = 10,000) demonstrated a robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R 2 within 2%) but not for Africa and Oceania (adjusted R 2 within 11%) where NO 2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO 2 concentrations. Variable contributions differed between continental regions, but major roads within 100 m and satellite-derived NO 2 were consistently the strongest predictors. The resulting model can be used for global risk assessments and health studies, particularly in countries without existing NO 2 monitoring data or models.

  9. Drought Patterns Forecasting using an Auto-Regressive Logistic Model

    Science.gov (United States)

    del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.

    2014-12-01

    Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.

  10. A Gompertz regression model for fern spores germination

    Directory of Open Access Journals (Sweden)

    Gabriel y Galán, Jose María

    2015-06-01

    Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados

  11. Collision prediction models using multivariate Poisson-lognormal regression.

    Science.gov (United States)

    El-Basyouny, Karim; Sayed, Tarek

    2009-07-01

    This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.

  12. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.

    Science.gov (United States)

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-10-01

    The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.

  13. Observational properties of models of semidetached close binaries. Pt. 2

    International Nuclear Information System (INIS)

    Giannone, P.; Giannuzzi, M.A.; Pucillo, M.

    1975-01-01

    Binaries of Cases A and B with intermediate and small masses have been studied. Synthetic light curves are shown to be affected mainly by the assumption concerning the shape of the components. The comparison between synthetic light curves and observed data can give further information on the reliability of the hypotheses assumed in the computations of binary star evolution. The calculated properties lead to useful indications about the evolutionary stages of observed binaries. The detection of systems evolving according to Case A appears to be favoured in comparison with that of systems of Case B. Systems with undersize subgiants result comparatively difficult to observe. (orig./BJ) [de

  14. Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Yidong Tang

    2016-01-01

    Full Text Available The sparse representation based classifier (SRC and its kernel version (KSRC have been employed for hyperspectral image (HSI classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem, where only the number of desired classes is required. Furthermore, the kernel method is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.

  15. Symbiotic stars - a binary model with super-critical accretion

    Energy Technology Data Exchange (ETDEWEB)

    Bath, G T [National Radio Astronomy Observatory, Charlottesville, Va. (USA)

    1977-01-01

    The structure of symbiotic variables is discussed in terms of a binary model. Disc accretion by a main sequence star or white dwarf at rates close to the Eddington limit produces an ultraviolet continuum source near the accreting star surface. This generates a variable, radiatively-driven, out-flowing wind. The wind is optically thick and the disc luminosity is absorbed and scattered and thus degraded into the optical region. Variations in the rate of mass loss in the wind lead to optical eruptions through shifts in the position of, and conditions in, the last scattering surface. The behaviour of Z And determined by Boyarchuk is shown to be in agreement with such a model. The conditions in the out-flowing wind are discussed. Limits on the mass loss rate are derived from conditions at the surface of the accreting star. It is suggested that variable out-flow in the wind is generated by fluctuations in disc luminosity produced by changes in the giant companions rate of mass transfer. The relation between symbiotic variables and classical and dwarf novae is discussed.

  16. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    NARCIS (Netherlands)

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

    2006-01-01

    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval

  17. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    Science.gov (United States)

    Ferrari, Alberto

    2017-01-01

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  18. Variable selection in Logistic regression model with genetic algorithm.

    Science.gov (United States)

    Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi

    2018-02-01

    Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

  19. Electricity prices forecasting by automatic dynamic harmonic regression models

    International Nuclear Information System (INIS)

    Pedregal, Diego J.; Trapero, Juan R.

    2007-01-01

    The changes experienced by electricity markets in recent years have created the necessity for more accurate forecast tools of electricity prices, both for producers and consumers. Many methodologies have been applied to this aim, but in the view of the authors, state space models are not yet fully exploited. The present paper proposes a univariate dynamic harmonic regression model set up in a state space framework for forecasting prices in these markets. The advantages of the approach are threefold. Firstly, a fast automatic identification and estimation procedure is proposed based on the frequency domain. Secondly, the recursive algorithms applied offer adaptive predictions that compare favourably with respect to other techniques. Finally, since the method is based on unobserved components models, explicit information about trend, seasonal and irregular behaviours of the series can be extracted. This information is of great value to the electricity companies' managers in order to improve their strategies, i.e. it provides management innovations. The good forecast performance and the rapid adaptability of the model to changes in the data are illustrated with actual prices taken from the PJM interconnection in the US and for the Spanish market for the year 2002. (author)

  20. Characteristics and Properties of a Simple Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Kowal Robert

    2016-12-01

    Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.

  1. Bayesian Regression of Thermodynamic Models of Redox Active Materials

    Energy Technology Data Exchange (ETDEWEB)

    Johnston, Katherine [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-09-01

    Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from the model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).

  2. Convergence diagnostics for Eigenvalue problems with linear regression model

    International Nuclear Information System (INIS)

    Shi, Bo; Petrovic, Bojan

    2011-01-01

    Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)

  3. The R Package threg to Implement Threshold Regression Models

    Directory of Open Access Journals (Sweden)

    Tao Xiao

    2015-08-01

    This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.

  4. Modeling the rate of HIV testing from repeated binary data amidst potential never-testers.

    Science.gov (United States)

    Rice, John D; Johnson, Brent A; Strawderman, Robert L

    2018-01-04

    Many longitudinal studies with a binary outcome measure involve a fraction of subjects with a homogeneous response profile. In our motivating data set, a study on the rate of human immunodeficiency virus (HIV) self-testing in a population of men who have sex with men (MSM), a substantial proportion of the subjects did not self-test during the follow-up study. The observed data in this context consist of a binary sequence for each subject indicating whether or not that subject experienced any events between consecutive observation time points, so subjects who never self-tested were observed to have a response vector consisting entirely of zeros. Conventional longitudinal analysis is not equipped to handle questions regarding the rate of events (as opposed to the odds, as in the classical logistic regression model). With the exception of discrete mixture models, such methods are also not equipped to handle settings in which there may exist a group of subjects for whom no events will ever occur, i.e. a so-called "never-responder" group. In this article, we model the observed data assuming that events occur according to some unobserved continuous-time stochastic process. In particular, we consider the underlying subject-specific processes to be Poisson conditional on some unobserved frailty, leading to a natural focus on modeling event rates. Specifically, we propose to use the power variance function (PVF) family of frailty distributions, which contains both the gamma and inverse Gaussian distributions as special cases and allows for the existence of a class of subjects having zero frailty. We generalize a computational algorithm developed for a log-gamma random intercept model (Conaway, 1990. A random effects model for binary data. Biometrics46, 317-328) to compute the exact marginal likelihood, which is then maximized to obtain estimates of model parameters. We conduct simulation studies, exploring the performance of the proposed method in comparison with

  5. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  6. Modeling and analysis of periodic orbits around a contact binary asteroid

    NARCIS (Netherlands)

    Feng, J.; Noomen, R.; Visser, P.N.A.M.; Yuan, J.

    2015-01-01

    The existence and characteristics of periodic orbits (POs) in the vicinity of a contact binary asteroid are investigated with an averaged spherical harmonics model. A contact binary asteroid consists of two components connected to each other, resulting in a highly bifurcated shape. Here, it is

  7. Ultracentrifuge separative power modeling with multivariate regression using covariance matrix

    International Nuclear Information System (INIS)

    Migliavacca, Elder

    2004-01-01

    In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)

  8. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Science.gov (United States)

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  9. Reducing Monte Carlo error in the Bayesian estimation of risk ratios using log-binomial regression models.

    Science.gov (United States)

    Salmerón, Diego; Cano, Juan A; Chirlaque, María D

    2015-08-30

    In cohort studies, binary outcomes are very often analyzed by logistic regression. However, it is well known that when the goal is to estimate a risk ratio, the logistic regression is inappropriate if the outcome is common. In these cases, a log-binomial regression model is preferable. On the other hand, the estimation of the regression coefficients of the log-binomial model is difficult owing to the constraints that must be imposed on these coefficients. Bayesian methods allow a straightforward approach for log-binomial regression models and produce smaller mean squared errors in the estimation of risk ratios than the frequentist methods, and the posterior inferences can be obtained using the software WinBUGS. However, Markov chain Monte Carlo methods implemented in WinBUGS can lead to large Monte Carlo errors in the approximations to the posterior inferences because they produce correlated simulations, and the accuracy of the approximations are inversely related to this correlation. To reduce correlation and to improve accuracy, we propose a reparameterization based on a Poisson model and a sampling algorithm coded in R. Copyright © 2015 John Wiley & Sons, Ltd.

  10. Short-term electricity prices forecasting based on support vector regression and Auto-regressive integrated moving average modeling

    International Nuclear Information System (INIS)

    Che Jinxing; Wang Jianzhou

    2010-01-01

    In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.

  11. An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions

    Energy Technology Data Exchange (ETDEWEB)

    Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.

    2016-07-01

    Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)

  12. Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models

    Directory of Open Access Journals (Sweden)

    Camerin Hahn

    2012-01-01

    Full Text Available As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007. However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed.

  13. application of multilinear regression analysis in modeling of soil

    African Journals Online (AJOL)

    Windows User

    Accordingly [1, 3] in their work, they applied linear regression ... (MLRA) is a statistical technique that uses several explanatory ... order to check this, they adopted bivariate correlation analysis .... groups, namely A-1 through A-7, based on their relative expected ..... Multivariate Regression in Gorgan Province North of Iran” ...

  14. Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

    Science.gov (United States)

    Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

    2012-02-01

    In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.

  15. Application of regression model on stream water quality parameters

    International Nuclear Information System (INIS)

    Suleman, M.; Maqbool, F.; Malik, A.H.; Bhatti, Z.A.

    2012-01-01

    Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)

  16. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  17. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  18. Methods of Detecting Outliers in A Regression Analysis Model ...

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    2013-06-01

    Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.

  19. 231 Using Multiple Regression Analysis in Modelling the Role of ...

    African Journals Online (AJOL)

    User

    of Internal Revenue, Tourism Bureau and hotel records. The multiple regression .... additional guest facilities such as restaurant, a swimming pool or child care and social function ... and provide good quality service to the public. Conclusion.

  20. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  1. Modeling AGN outbursts from supermassive black hole binaries

    Directory of Open Access Journals (Sweden)

    Tanaka T.

    2012-12-01

    Full Text Available When galaxies merge to assemble more massive galaxies, their nuclear supermassive black holes (SMBHs should form bound binaries. As these interact with their stellar and gaseous environments, they will become increasingly compact, culminating in inspiral and coalescence through the emission of gravitational radiation. Because galaxy mergers and interactions are also thought to fuel star formation and nuclear black hole activity, it is plausible that such binaries would lie in gas-rich environments and power active galactic nuclei (AGN. The primary difference is that these binaries have gravitational potentials that vary – through their orbital motion as well as their orbital evolution – on humanly tractable timescales, and are thus excellent candidates to give rise to coherent AGN variability in the form of outbursts and recurrent transients. Although such electromagnetic signatures would be ideally observed concomitantly with the binary’s gravitational-wave signatures, they are also likely to be discovered serendipitously in wide-field, high-cadence surveys; some may even be confused for stellar tidal disruption events. I discuss several types of possible “smoking gun” AGN signatures caused by the peculiar geometry predicted for accretion disks around SMBH binaries.

  2. Direct modeling of regression effects for transition probabilities in the progressive illness-death model

    DEFF Research Database (Denmark)

    Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo

    2017-01-01

    In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...

  3. A logistic regression model for Ghana National Health Insurance claims

    Directory of Open Access Journals (Sweden)

    Samuel Antwi

    2013-07-01

    Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.

  4. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    Science.gov (United States)

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  5. A model of two-stream non-radial accretion for binary X-ray pulsars

    International Nuclear Information System (INIS)

    Lipunov, V.M.

    1982-01-01

    The general case of non-radial accretion is assumed to occur in real binary systems containing X-ray pulsars. The structure and the stability of the magnetosphere, the interaction between the magnetosphere and accreted matter, as well as evolution of neutron star in close binary system are examined within the framework of the two-stream model of nonradial accretion onto a magnetized neutron star. Observable parameters of X-ray pulsars are explained in terms of the model considered. (orig.)

  6. [Logistic regression model of noninvasive prediction for portal hypertensive gastropathy in patients with hepatitis B associated cirrhosis].

    Science.gov (United States)

    Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo

    2015-05-12

    To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.

  7. An epidemiological survey on road traffic crashes in Iran: application of the two logistic regression models.

    Science.gov (United States)

    Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid

    2014-01-01

    Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.

  8. A generalized additive regression model for survival times

    DEFF Research Database (Denmark)

    Scheike, Thomas H.

    2001-01-01

    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  9. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  10. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  11. Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan; Zvárová, Jana

    2017-01-01

    Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability

  12. Logistic regression modelling: procedures and pitfalls in developing and interpreting prediction models

    Directory of Open Access Journals (Sweden)

    Nataša Šarlija

    2017-01-01

    Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.

  13. A template bank to search for gravitational waves from inspiralling compact binaries: I. Physical models

    International Nuclear Information System (INIS)

    Babak, S; Balasubramanian, R; Churches, D; Cokelaer, T; Sathyaprakash, B S

    2006-01-01

    Gravitational waves from coalescing compact binaries are searched for using the matched filtering technique. As the model waveform depends on a number of parameters, it is necessary to filter the data through a template bank covering the astrophysically interesting region of the parameter space. The choice of templates is defined by the maximum allowed drop in signal-to-noise ratio due to the discreteness of the template bank. In this paper we describe the template-bank algorithm that was used in the analysis of data from the Laser Interferometer Gravitational Wave Observatory (LIGO) and GEO 600 detectors to search for signals from binaries consisting of non-spinning compact objects. Using Monte Carlo simulations, we study the efficiency of the bank and show that its performance is satisfactory for the design sensitivity curves of ground-based interferometric gravitational wave detectors GEO 600, initial LIGO, advanced LIGO and Virgo. The bank is efficient in searching for various compact binaries such as binary primordial black holes, binary neutron stars, binary black holes, as well as a mixed binary consisting of a non-spinning black hole and a neutron star

  14. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  15. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    Science.gov (United States)

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  16. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

    Directory of Open Access Journals (Sweden)

    Soyoung Park

    2017-07-01

    Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

  17. Spatio-kinematic modelling: Testing the link between planetary nebulae and close binaries

    OpenAIRE

    Jones, David; Tyndall, Amy A.; Huckvale, Leo; Prouse, Barnabas; Lloyd, Myfanwy

    2011-01-01

    It is widely believed that central star binarity plays an important role in the formation and evolution of aspherical planetary nebulae, however observational support for this hypothesis is lacking. Here, we present the most recent results of a continuing programme to model the morphologies of all planetary nebulae known to host a close binary central star. Initially, this programme allows us to compare the inclination of the nebular symmetry axis to that of the binary plane, testing the theo...

  18. Physics of Eclipsing Binaries: Modelling in the new era of ultra-high precision photometry

    OpenAIRE

    Pavlovski, K.; Bloemen, S.; Degroote, P.; Conroy, K.; Hambleton, Kelly; Giammarco, J.M.; Pablo, H.; Prša, A.; Tkachenko, A.; Torres, G.

    2013-01-01

    Recent ultra-high precision observations of eclipsing binaries, especially data acquired by the Kepler satellite, have made accurate light curve modelling increasingly challenging but also more rewarding. In this contribution, we discuss low-amplitude signals in light curves that can now be used to derive physical information about eclipsing binaries but that were unaccessible before the Kepler era. A notable example is the detection of Doppler beaming, which leads to an increase in flux when...

  19. Physics of Eclipsing Binaries: Motivation for the New-Age Modeling Suite

    OpenAIRE

    Pavlovski, K.; Prša, A.; Degroote, P.; Conroy, K.; Bloemen, S.; Hambleton, Kelly; Giammarco, J.; Pablo, H.; Tkachenko, A.; Torres, G.

    2013-01-01

    Recent ultra-high precision observations of eclipsing binaries, especially data acquired by the Kepler satellite, have made accurate light curve modelling increasingly challenging but also more rewarding. In this contribution, we discuss low-amplitude signals in light curves that can now be used to derive physical information about eclipsing binaries but that were unaccessible before the Kepler era. A notable example is the detection of Doppler beaming, which leads to an increase in flux when...

  20. Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2017-01-01

    In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...

  1. Semiparametric nonlinear quantile regression model for financial returns

    Czech Academy of Sciences Publication Activity Database

    Avdulaj, Krenar; Baruník, Jozef

    2017-01-01

    Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economic s OBOR OECD: Applied Economic s, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf

  2. A new model for predicting thermodynamic properties of ternary metallic solution from binary components

    International Nuclear Information System (INIS)

    Fang Zheng; Zhang Quanru

    2006-01-01

    A model has been derived to predict thermodynamic properties of ternary metallic systems from those of its three binaries. In the model, the excess Gibbs free energies and the interaction parameter ω 123 for three components of a ternary are expressed as a simple sum of those of the three sub-binaries, and the mole fractions of the components of the ternary are identical with the sub-binaries. This model is greatly simplified compared with the current symmetrical and asymmetrical models. It is able to overcome some shortcomings of the current models, such as the arrangement of the components in the Gibbs triangle, the conversion of mole fractions between ternary and corresponding binaries, and some necessary processes for optimizing the various parameters of these models. Two ternary systems, Mg-Cu-Ni and Cd-Bi-Pb are recalculated to demonstrate the validity and precision of the present model. The calculated results on the Mg-Cu-Ni system are better than those in the literature. New parameters in the Margules equations expressing the excess Gibbs free energies of three binary systems of the Cd-Bi-Pb ternary system are also given

  3. Use of multiple linear regression and logistic regression models to investigate changes in birthweight for term singleton infants in Scotland.

    Science.gov (United States)

    Bonellie, Sandra R

    2012-10-01

    To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother.   Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.

  4. Multiple logistic regression model of signalling practices of drivers on urban highways

    Science.gov (United States)

    Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana

    2015-05-01

    Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.

  5. Photometric Analysis and Modeling of Five Mass-Transferring Binary Systems

    Science.gov (United States)

    Geist, Emily; Beaky, Matthew; Jamison, Kate

    2018-01-01

    In overcontact eclipsing binary systems, both stellar components have overfilled their Roche lobes, resulting in a dumbbell-shaped shared envelope. Mass transfer is common in overcontact binaries, which can be observed as a slow change on the rotation period of the system.We studied five overcontact eclipsing binary systems with evidence of period change, and thus likely mass transfer between the components, identified by Nelson (2014): V0579 Lyr, KN Vul, V0406 Lyr, V2240 Cyg, and MS Her. We used the 31-inch NURO telescope at Lowell Observatory in Flagstaff, Arizona to obtain images in B,V,R, and I filters for V0579 Lyr, and the 16-inch Meade LX200GPS telescope with attached SBIG ST-8XME CCD camera at Juniata College in Huntingdon, Pennsylvania to image KN Vul, V0406 Lyr, V2240 Cyg, and MS Her, also in B,V,R, and I.After data reduction, we created light curves for each of the systems and modeled the eclipsing binaries using the BinaryMaker3 and PHOEBE programs to determine their fundamental physical parameters for the first time. Complete light curves and preliminary models for each of these neglected eclipsing binary systems will be presented.

  6. Waveform model for an eccentric binary black hole based on the effective-one-body-numerical-relativity formalism

    Science.gov (United States)

    Cao, Zhoujian; Han, Wen-Biao

    2017-08-01

    Binary black hole systems are among the most important sources for gravitational wave detection. They are also good objects for theoretical research for general relativity. A gravitational waveform template is important to data analysis. An effective-one-body-numerical-relativity (EOBNR) model has played an essential role in the LIGO data analysis. For future space-based gravitational wave detection, many binary systems will admit a somewhat orbit eccentricity. At the same time, the eccentric binary is also an interesting topic for theoretical study in general relativity. In this paper, we construct the first eccentric binary waveform model based on an effective-one-body-numerical-relativity framework. Our basic assumption in the model construction is that the involved eccentricity is small. We have compared our eccentric EOBNR model to the circular one used in the LIGO data analysis. We have also tested our eccentric EOBNR model against another recently proposed eccentric binary waveform model; against numerical relativity simulation results; and against perturbation approximation results for extreme mass ratio binary systems. Compared to numerical relativity simulations with an eccentricity as large as about 0.2, the overlap factor for our eccentric EOBNR model is better than 0.98 for all tested cases, including spinless binary and spinning binary, equal mass binary, and unequal mass binary. Hopefully, our eccentric model can be the starting point to develop a faithful template for future space-based gravitational wave detectors.

  7. First Higher-Multipole Model of Gravitational Waves from Spinning and Coalescing Black-Hole Binaries.

    Science.gov (United States)

    London, Lionel; Khan, Sebastian; Fauchon-Jones, Edward; García, Cecilio; Hannam, Mark; Husa, Sascha; Jiménez-Forteza, Xisco; Kalaghatgi, Chinmay; Ohme, Frank; Pannarale, Francesco

    2018-04-20

    Gravitational-wave observations of binary black holes currently rely on theoretical models that predict the dominant multipoles (ℓ=2,|m|=2) of the radiation during inspiral, merger, and ringdown. We introduce a simple method to include the subdominant multipoles to binary black hole gravitational waveforms, given a frequency-domain model for the dominant multipoles. The amplitude and phase of the original model are appropriately stretched and rescaled using post-Newtonian results (for the inspiral), perturbation theory (for the ringdown), and a smooth transition between the two. No additional tuning to numerical-relativity simulations is required. We apply a variant of this method to the nonprecessing PhenomD model. The result, PhenomHM, constitutes the first higher-multipole model of spinning and coalescing black-hole binaries, and currently includes the (ℓ,|m|)=(2,2),(3,3),(4,4),(2,1),(3,2),(4,3) radiative moments. Comparisons with numerical-relativity waveforms demonstrate that PhenomHM is more accurate than dominant-multipole-only models for all binary configurations, and typically improves the measurement of binary properties.

  8. First Higher-Multipole Model of Gravitational Waves from Spinning and Coalescing Black-Hole Binaries

    Science.gov (United States)

    London, Lionel; Khan, Sebastian; Fauchon-Jones, Edward; García, Cecilio; Hannam, Mark; Husa, Sascha; Jiménez-Forteza, Xisco; Kalaghatgi, Chinmay; Ohme, Frank; Pannarale, Francesco

    2018-04-01

    Gravitational-wave observations of binary black holes currently rely on theoretical models that predict the dominant multipoles (ℓ=2 ,|m |=2 ) of the radiation during inspiral, merger, and ringdown. We introduce a simple method to include the subdominant multipoles to binary black hole gravitational waveforms, given a frequency-domain model for the dominant multipoles. The amplitude and phase of the original model are appropriately stretched and rescaled using post-Newtonian results (for the inspiral), perturbation theory (for the ringdown), and a smooth transition between the two. No additional tuning to numerical-relativity simulations is required. We apply a variant of this method to the nonprecessing PhenomD model. The result, PhenomHM, constitutes the first higher-multipole model of spinning and coalescing black-hole binaries, and currently includes the (ℓ,|m |)=(2 ,2 ),(3 ,3 ),(4 ,4 ),(2 ,1 ),(3 ,2 ),(4 ,3 ) radiative moments. Comparisons with numerical-relativity waveforms demonstrate that PhenomHM is more accurate than dominant-multipole-only models for all binary configurations, and typically improves the measurement of binary properties.

  9. Beta Regression Finite Mixture Models of Polarization and Priming

    Science.gov (United States)

    Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay

    2011-01-01

    This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…

  10. PREDICTION OF THE MIXING ENTHALPIES OF BINARY LIQUID ALLOYS BY MOLECULAR INTERACTION VOLUME MODEL

    Institute of Scientific and Technical Information of China (English)

    H.W.Yang; D.P.Tao; Z.H.Zhou

    2008-01-01

    The mixing enthalpies of 23 binary liquid alloys are calculated by molecular interaction volume model (MIVM), which is a two-parameter model with the partial molar infinite dilute mixing enthalpies. The predicted values are in agreement with the experimental data and then indicate that the model is reliable and convenient.

  11. A globally accurate theory for a class of binary mixture models

    Science.gov (United States)

    Dickman, Adriana G.; Stell, G.

    The self-consistent Ornstein-Zernike approximation results for the 3D Ising model are used to obtain phase diagrams for binary mixtures described by decorated models, yielding the plait point, binodals, and closed-loop coexistence curves for the models proposed by Widom, Clark, Neece, and Wheeler. The results are in good agreement with series expansions and experiments.

  12. Modeling the dynamics of urban growth using multinomial logistic regression: a case study of Jiayu County, Hubei Province, China

    Science.gov (United States)

    Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei

    2008-10-01

    Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.

  13. A generalized exponential time series regression model for electricity prices

    DEFF Research Database (Denmark)

    Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso

    on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...

  14. Forecast Model of Urban Stagnant Water Based on Logistic Regression

    Directory of Open Access Journals (Sweden)

    Liu Pan

    2017-01-01

    Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.

  15. Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.

    Science.gov (United States)

    Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina

    2017-07-01

    This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.

  16. A wide low-mass binary model for the origin of axially symmetric non-thermal radio sources

    International Nuclear Information System (INIS)

    Kool, M. de; Heuvel, E.P.J. van den

    1985-01-01

    An accreting binary model has been proposed by recent workers to account for the origin of the axially symmetric non-thermal radio sources. The authors show that the only type of binary system that can produce the observed structural properties, is a relatively wide neutron star binary, in which the companion of the neutron star is a low-mass giant. Binaries of this type are expected to resemble closely the eight brightest galactic bulge X-ray sources as well as the progenitors of the two wide radio pulsar binaries. (U.K.)

  17. Better Autologistic Regression

    Directory of Open Access Journals (Sweden)

    Mark A. Wolters

    2017-11-01

    Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

  18. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  19. Model for competitive binary and ternary ion-molecule reactions

    International Nuclear Information System (INIS)

    Herbst, E.

    1985-01-01

    A mechanism by which competitive binary and ternary ion-molecule reactions can occur is proposed. Calculations are undertaken for the specific system CH3(+) + NH3 + He which has been studied in the laboratory by Smith and Adams (1978), and the potential surface of which has been studied theoretically by Nobes and Radom (1983). It is shown that a potential-energy barrier in the exit channel prevents the rapid dissociation of collision complexes with large amounts of angular momentum and thereby allows collisional stabilization of the complexes. The calculated ternary-reaction rate coefficient is in good agreement with the experimental value, but a plot of the effective two-body rate coefficient of the ternary channel vs helium density does not quite show the observed saturation. 21 references

  20. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  1. Misspecified poisson regression models for large-scale registry data

    DEFF Research Database (Denmark)

    Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.

    2016-01-01

    working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...

  2. Logistic regression model for detecting radon prone areas in Ireland.

    Science.gov (United States)

    Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S

    2017-12-01

    A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model.

    Science.gov (United States)

    Islam, Mohammad Mafijul; Alam, Morshed; Tariquzaman, Md; Kabir, Mohammad Alamgir; Pervin, Rokhsona; Begum, Munni; Khan, Md Mobarak Hossain

    2013-01-08

    Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance variable namely mother's education, father's education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh.

  5. Modeling diffusion coefficients in binary mixtures of polar and non-polar compounds

    DEFF Research Database (Denmark)

    Medvedev, Oleg; Shapiro, Alexander

    2005-01-01

    The theory of transport coefficients in liquids, developed previously, is tested on a description of the diffusion coefficients in binary polar/non-polar mixtures, by applying advanced thermodynamic models. Comparison to a large set of experimental data shows good performance of the model. Only f...

  6. Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits

    National Research Council Canada - National Science Library

    Gravier, Michael

    1999-01-01

    .... This thesis draws on available data from the electronics integrated circuit industry to attempt to assess whether statistical modeling offers a viable method for predicting the presence of DMSMS...

  7. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  8. Martingale Regressions for a Continuous Time Model of Exchange Rates

    OpenAIRE

    Guo, Zi-Yi

    2017-01-01

    One of the daunting problems in international finance is the weak explanatory power of existing theories of the nominal exchange rates, the so-called “foreign exchange rate determination puzzle”. We propose a continuous-time model to study the impact of order flow on foreign exchange rates. The model is estimated by a newly developed econometric tool based on a time-change sampling from calendar to volatility time. The estimation results indicate that the effect of order flow on exchange rate...

  9. Focused information criterion and model averaging based on weighted composite quantile regression

    KAUST Repository

    Xu, Ganggang; Wang, Suojin; Huang, Jianhua Z.

    2013-01-01

    We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non

  10. Numerical simulation of binary collisions using a modified surface tension model with particle method

    International Nuclear Information System (INIS)

    Sun Zhongguo; Xi Guang; Chen Xi

    2009-01-01

    The binary collision of liquid droplets is of both practical importance and fundamental value in computational fluid mechanics. We present a modified surface tension model within the moving particle semi-implicit (MPS) method, and carry out two-dimensional simulations to investigate the mechanisms of coalescence and separation of the droplets during binary collision. The modified surface tension model improves accuracy and convergence. A mechanism map is established for various possible deformation pathways encountered during binary collision, as the impact speed is varied; a new pathway is reported when the collision speed is critical. In addition, eccentric collisions are simulated and the effect of the rotation of coalesced particle is explored. The results qualitatively agree with experiments and the numerical protocol may find applications in studying free surface flows and interface deformation

  11. Interfacing modeling suite Physics Of Eclipsing Binaries 2.0 with a Virtual Reality Platform

    Science.gov (United States)

    Harriett, Edward; Conroy, Kyle; Prša, Andrej; Klassner, Frank

    2018-01-01

    To explore alternate methods for modeling eclipsing binary stars, we extrapolate upon PHOEBE’s (PHysics Of Eclipsing BinariEs) capabilities in a virtual reality (VR) environment to create an immersive and interactive experience for users. The application used is Vizard, a python-scripted VR development platform for environments such as Cave Automatic Virtual Environment (CAVE) and other off-the-shelf VR headsets. Vizard allows the freedom for all modeling to be precompiled without compromising functionality or usage on its part. The system requires five arguments to be precomputed using PHOEBE’s python front-end: the effective temperature, flux, relative intensity, vertex coordinates, and orbits; the user can opt to implement other features from PHOEBE to be accessed within the simulation as well. Here we present the method for making the data observables accessible in real time. An Occulus Rift will be available for a live showcase of various cases of VR rendering of PHOEBE binary systems including detached and contact binary stars.

  12. Cox's regression model for dynamics of grouped unemployment data

    Czech Academy of Sciences Publication Activity Database

    Volf, Petr

    2003-01-01

    Roč. 10, č. 19 (2003), s. 151-162 ISSN 1212-074X R&D Projects: GA ČR GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : mathematical statistics * survival analysis * Cox's model Subject RIV: BB - Applied Statistics, Operational Research

  13. Multiple Linear Regression Model for Estimating the Price of a ...

    African Journals Online (AJOL)

    Ghana Mining Journal ... In the modeling, the Ordinary Least Squares (OLS) normality assumption which could introduce errors in the statistical analyses was dealt with by log transformation of the data, ensuring the data is normally ... The resultant MLRM is: Ŷi MLRM = (X'X)-1X'Y(xi') where X is the sample data matrix.

  14. Inflation, Forecast Intervals and Long Memory Regression Models

    NARCIS (Netherlands)

    C.S. Bos (Charles); Ph.H.B.F. Franses (Philip Hans); M. Ooms (Marius)

    2001-01-01

    textabstractWe examine recursive out-of-sample forecasting of monthly postwar U.S. core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading

  15. Inflation, Forecast Intervals and Long Memory Regression Models

    NARCIS (Netherlands)

    Ooms, M.; Bos, C.S.; Franses, P.H.

    2003-01-01

    We examine recursive out-of-sample forecasting of monthly postwar US core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading indicators

  16. Data-driven modelling of LTI systems using symbolic regression

    NARCIS (Netherlands)

    Khandelwal, D.; Toth, R.; Van den Hof, P.M.J.

    2017-01-01

    The aim of this project is to automate the task of data-driven identification of dynamical systems. The underlying goal is to develop an identification tool that models a physical system without distinguishing between classes of systems such as linear, nonlinear or possibly even hybrid systems. Such

  17. Nonparametric Estimation of Regression Parameters in Measurement Error Models

    Czech Academy of Sciences Publication Activity Database

    Ehsanes Saleh, A.K.M.D.; Picek, J.; Kalina, Jan

    2009-01-01

    Roč. 67, č. 2 (2009), s. 177-200 ISSN 0026-1424 Grant - others:GA AV ČR(CZ) IAA101120801; GA MŠk(CZ) LC06024 Institutional research plan: CEZ:AV0Z10300504 Keywords : asymptotic relative efficiency(ARE) * asymptotic theory * emaculate mode * Me model * R-estimation * Reliabilty ratio(RR) Subject RIV: BB - Applied Statistics, Operational Research

  18. Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

    Directory of Open Access Journals (Sweden)

    Paccaud Fred

    2004-04-01

    Full Text Available Abstract Background We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. Methods Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i linear regression; (ii logistic classification; (iii regression trees; (iv classification trees (iii and iv are collectively known as "CART". Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. Results Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. Conclusions There were no striking differences between either the algebraic (i, ii vs. non-algebraic (iii, iv, or the regression (i, iii vs. classification (ii, iv modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

  19. Financial performance monitoring of the technical efficiency of critical access hospitals: a data envelopment analysis and logistic regression modeling approach.

    Science.gov (United States)

    Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V

    2012-01-01

    From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).

  20. Shaofu Zhuyu Decoction Regresses Endometriotic Lesions in a Rat Model

    Directory of Open Access Journals (Sweden)

    Guanghui Zhu

    2018-01-01

    Full Text Available The current therapies for endometriosis are restricted by various side effects and treatment outcome has been less than satisfactory. Shaofu Zhuyu Decoction (SZD, a classic traditional Chinese medicinal (TCM prescription for dysmenorrhea, has been widely used in clinical practice by TCM doctors to relieve symptoms of endometriosis. The present study aimed to investigate the effects of SZD on a rat model of endometriosis. Forty-eight female Sprague-Dawley rats with regular estrous cycles went through autotransplantation operation to establish endometriosis model. Then 38 rats with successful ectopic implants were randomized into two groups: vehicle- and SZD-treated groups. The latter were administered SZD through oral gavage for 4 weeks. By the end of the treatment period, the volume of the endometriotic lesions was measured, the histopathological properties of the ectopic endometrium were evaluated, and levels of proliferating cell nuclear antigen (PCNA, CD34, and hypoxia inducible factor- (HIF- 1α in the ectopic endometrium were detected with immunohistochemistry. Furthermore, apoptosis was assessed using the terminal deoxynucleotidyl transferase (TdT deoxyuridine 5′-triphosphate (dUTP nick-end labeling (TUNEL assay. In this study, SZD significantly reduced the size of ectopic lesions in rats with endometriosis, inhibited cell proliferation, increased cell apoptosis, and reduced microvessel density and HIF-1α expression. It suggested that SZD could be an effective therapy for the treatment and prevention of endometriosis recurrence.

  1. [Application of detecting and taking overdispersion into account in Poisson regression model].

    Science.gov (United States)

    Bouche, G; Lepage, B; Migeot, V; Ingrand, P

    2009-08-01

    Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.

  2. Trojan Binaries

    Science.gov (United States)

    Noll, K. S.

    2017-12-01

    The Jupiter Trojans, in the context of giant planet migration models, can be thought of as an extension of the small body populations found beyond Neptune in the Kuiper Belt. Binaries are a distinctive feature of small body populations in the Kuiper Belt with an especially high fraction apparent among the brightest Cold Classicals. The binary fraction, relative sizes, and separations in the dynamically excited populations (Scattered, Resonant) reflects processes that may have eroded a more abundant initial population. This trend continues in the Centaurs and Trojans where few binaries have been found. We review new evidence including a third resolved Trojan binary and lightcurve studies to understand how the Trojans are related to the small body populations that originated in the outer protoplanetary disk.

  3. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...

  4. Solving and interpreting binary classification problems in marketing with SVMs

    NARCIS (Netherlands)

    J.C. Bioch (Cor); P.J.F. Groenen (Patrick); G.I. Nalbantov (Georgi)

    2005-01-01

    textabstractMarketing problems often involve inary classification of customers into ``buyers'' versus ``non-buyers'' or ``prefers brand A'' versus ``prefers brand B''. These cases require binary classification models such as logistic regression, linear, and quadratic discriminant analysis. A

  5. Computation of infinite dilute activity coefficients of binary liquid alloys using complex formation model

    Energy Technology Data Exchange (ETDEWEB)

    Awe, O.E., E-mail: draweoe2004@yahoo.com; Oshakuade, O.M.

    2016-04-15

    A new method for calculating Infinite Dilute Activity Coefficients (γ{sup ∞}s) of binary liquid alloys has been developed. This method is basically computing γ{sup ∞}s from experimental thermodynamic integral free energy of mixing data using Complex formation model. The new method was first used to theoretically compute the γ{sup ∞}s of 10 binary alloys whose γ{sup ∞}s have been determined by experiments. The significant agreement between the computed values and the available experimental values served as impetus for applying the new method to 22 selected binary liquid alloys whose γ{sup ∞}s are either nonexistent or incomplete. In order to verify the reliability of the computed γ{sup ∞}s of the 22 selected alloys, we recomputed the γ{sup ∞}s using three other existing methods of computing or estimating γ{sup ∞}s and then used the γ{sup ∞}s obtained from each of the four methods (the new method inclusive) to compute thermodynamic activities of components of each of the binary systems. The computed activities were compared with available experimental activities. It is observed that the results from the method being proposed, in most of the selected alloys, showed better agreement with experimental activity data. Thus, the new method is an alternative and in certain instances, more reliable approach of computing γ{sup ∞}s of binary liquid alloys.

  6. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  7. Fractional Gaussian noise-enhanced information capacity of a nonlinear neuron model with binary signal input

    Science.gov (United States)

    Gao, Feng-Yin; Kang, Yan-Mei; Chen, Xi; Chen, Guanrong

    2018-05-01

    This paper reveals the effect of fractional Gaussian noise with Hurst exponent H ∈(1 /2 ,1 ) on the information capacity of a general nonlinear neuron model with binary signal input. The fGn and its corresponding fractional Brownian motion exhibit long-range, strong-dependent increments. It extends standard Brownian motion to many types of fractional processes found in nature, such as the synaptic noise. In the paper, for the subthreshold binary signal, sufficient conditions are given based on the "forbidden interval" theorem to guarantee the occurrence of stochastic resonance, while for the suprathreshold binary signal, the simulated results show that additive fGn with Hurst exponent H ∈(1 /2 ,1 ) could increase the mutual information or bits count. The investigation indicated that the synaptic noise with the characters of long-range dependence and self-similarity might be the driving factor for the efficient encoding and decoding of the nervous system.

  8. Two levels ARIMAX and regression models for forecasting time series data with calendar variation effects

    Science.gov (United States)

    Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi

    2015-12-01

    The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.

  9. Optimization of binary breeder reactor. 1. Sodium void reactivity and Doppler effect in a new model

    International Nuclear Information System (INIS)

    Nascimento, J.A. do; Dias, A.F.; Ishiguro, Y.

    1985-01-01

    A model for the Binary Breeder Reactor (BBR) is examined for the inherent safety characteristics, sodium void reactivity and Doppler effect in the beginning of cycle and a hypothetical end of cycle. In addition to the standard fueling mode of the BBR, two others are considered: U 238 /U 233 -alternate fueling, and U 238 /PU-normal fueling of LMFBRs. (Author) [pt

  10. Theoretical model of the density of states of random binary alloys

    International Nuclear Information System (INIS)

    Zekri, N.; Brezini, A.

    1991-09-01

    A theoretical formulation of the density of states for random binary alloys is examined based on a mean field treatment. The present model includes both diagonal and off-diagonal disorder and also short-range order. Extensive results are reported for various concentrations and compared to other calculations. (author). 22 refs, 6 figs

  11. Economic modeling using evolutionary algorithms : the effect of binary encoding of strategies

    NARCIS (Netherlands)

    Waltman, L.R.; Eck, van N.J.; Dekker, Rommert; Kaymak, U.

    2011-01-01

    We are concerned with evolutionary algorithms that are employed for economic modeling purposes. We focus in particular on evolutionary algorithms that use a binary encoding of strategies. These algorithms, commonly referred to as genetic algorithms, are popular in agent-based computational economics

  12. Soot modeling of counterflow diffusion flames of ethylene-based binary mixture fuels

    KAUST Repository

    Wang, Yu; Raj, Abhijeet Dhayal; Chung, Suk-Ho

    2015-01-01

    of ethylene and its binary mixtures with methane, ethane and propane based on the method of moments. The soot model has 36 soot nucleation reactions from 8 PAH molecules including pyrene and larger PAHs. Soot surface growth reactions were based on a modified

  13. A Statistical Model for Misreported Binary Outcomes in Clustered RCTs of Education Interventions

    Science.gov (United States)

    Schochet, Peter Z.

    2013-01-01

    In education randomized control trials (RCTs), the misreporting of student outcome data could lead to biased estimates of average treatment effects (ATEs) and their standard errors. This article discusses a statistical model that adjusts for misreported binary outcomes for two-level, school-based RCTs, where it is assumed that misreporting could…

  14. (Liquid plus liquid) equilibria of binary polymer solutions using a free-volume UNIQUAC-NRF model

    DEFF Research Database (Denmark)

    Radfarnia, H.R.; Ghotbi, C.; Taghikhani, V.

    2006-01-01

    + liquid) equilibria (LLE) for a number of binary polymer solutions at various temperatures. The values for the binary characteristic energy parameters for the proposed model and the FV-UNIQUAC model along with their average relative deviations from the experimental data were reported. It should be stated...

  15. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  16. On a Robust MaxEnt Process Regression Model with Sample-Selection

    Directory of Open Access Journals (Sweden)

    Hea-Jung Kim

    2018-04-01

    Full Text Available In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.

  17. Two-dimensional model of laser alloying of binary alloy powder with interval of melting temperature

    Science.gov (United States)

    Knyzeva, A. G.; Sharkeev, Yu. P.

    2017-10-01

    The paper contains two-dimensional model of laser beam melting of powders from binary alloy. The model takes into consideration the melting of alloy in some temperature interval between solidus and liquidus temperatures. The external source corresponds to laser beam with energy density distributed by Gauss law. The source moves along the treated surface according to given trajectory. The model allows investigating the temperature distribution and thickness of powder layer depending on technological parameters.

  18. Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

    Directory of Open Access Journals (Sweden)

    Ivanka Jerić

    2011-11-01

    Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.

  19. A generalized right truncated bivariate Poisson regression model with applications to health data.

    Science.gov (United States)

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  20. Testing for constant nonparametric effects in general semiparametric regression models with interactions

    KAUST Repository

    Wei, Jiawei; Carroll, Raymond J.; Maity, Arnab

    2011-01-01

    We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work

  1. A binary genetic programing model for teleconnection identification between global sea surface temperature and local maximum monthly rainfall events

    Science.gov (United States)

    Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir

    2017-12-01

    The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.

  2. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  3. Analysis of dental caries using generalized linear and count regression models

    Directory of Open Access Journals (Sweden)

    Javali M. Phil

    2013-11-01

    Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.

  4. Accounting for measurement error in log regression models with applications to accelerated testing.

    Directory of Open Access Journals (Sweden)

    Robert Richardson

    Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  5. Accounting for measurement error in log regression models with applications to accelerated testing.

    Science.gov (United States)

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  6. Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts

    DEFF Research Database (Denmark)

    Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain

    2017-01-01

    Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella....... One part of obtained logtransformed cell counts was used for model development and another for model validation. The Ratkowsky square root model and the relative lag time (RLT) model were integrated into the logistic model with delay. Fitted parameter estimates were compared to investigate the effect...

  7. Predictive market segmentation model: An application of logistic regression model and CHAID procedure

    Directory of Open Access Journals (Sweden)

    Soldić-Aleksić Jasna

    2009-01-01

    Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.

  8. Functionally unidimensional item response models for multivariate binary data

    DEFF Research Database (Denmark)

    Ip, Edward; Molenberghs, Geert; Chen, Shyh-Huei

    2013-01-01

    The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model to such multidimensio......The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model...... to such multidimensional data is believed to result in ability estimates that represent a combination of the major and minor dimensions. We conjecture that the underlying dimension for the fitted unidimensional model, which we call the functional dimension, represents a nonlinear projection. In this article we investigate...... tool. An example regarding a construct of desire for physical competency is used to illustrate the functional unidimensional approach....

  9. MESA models of the evolutionary state of the interacting binary epsilon Aurigae

    Science.gov (United States)

    Gibson, Justus L.; Stencel, Robert E.

    2018-06-01

    Using MESA code (Modules for Experiments in Stellar Astrophysics, version 9575), an evaluation was made of the evolutionary state of the epsilon Aurigae binary system (HD 31964, F0Iap + disc). We sought to satisfy several observational constraints: (1) requiring evolutionary tracks to pass close to the current temperature and luminosity of the primary star; (2) obtaining a period near the observed value of 27.1 years; (3) matching a mass function of 3.0; (4) concurrent Roche lobe overflow and mass transfer; (5) an isotopic ratio 12C/13C = 5 and, (6) matching the interferometrically determined angular diameter. A MESA model starting with binary masses of 9.85 + 4.5 M⊙, with a 100 d initial period, produces a 1.2 + 10.6 M⊙ result having a 547 d period, and a single digit 12C/13C ratio. These values were reached near an age of 20 Myr, when the donor star comes close to the observed luminosity and temperature for epsilon Aurigae A, as a post-RGB/pre-AGB star. Contemporaneously, the accretor then appears as an upper main-sequence, early B-type star. This benchmark model can provide a basis for further exploration of this interacting binary, and other long-period binary stars.

  10. Modeling long correlation times using additive binary Markov chains: Applications to wind generation time series.

    Science.gov (United States)

    Weber, Juliane; Zachow, Christopher; Witthaut, Dirk

    2018-03-01

    Wind power generation exhibits a strong temporal variability, which is crucial for system integration in highly renewable power systems. Different methods exist to simulate wind power generation but they often cannot represent the crucial temporal fluctuations properly. We apply the concept of additive binary Markov chains to model a wind generation time series consisting of two states: periods of high and low wind generation. The only input parameter for this model is the empirical autocorrelation function. The two-state model is readily extended to stochastically reproduce the actual generation per period. To evaluate the additive binary Markov chain method, we introduce a coarse model of the electric power system to derive backup and storage needs. We find that the temporal correlations of wind power generation, the backup need as a function of the storage capacity, and the resting time distribution of high and low wind events for different shares of wind generation can be reconstructed.

  11. Modeling long correlation times using additive binary Markov chains: Applications to wind generation time series

    Science.gov (United States)

    Weber, Juliane; Zachow, Christopher; Witthaut, Dirk

    2018-03-01

    Wind power generation exhibits a strong temporal variability, which is crucial for system integration in highly renewable power systems. Different methods exist to simulate wind power generation but they often cannot represent the crucial temporal fluctuations properly. We apply the concept of additive binary Markov chains to model a wind generation time series consisting of two states: periods of high and low wind generation. The only input parameter for this model is the empirical autocorrelation function. The two-state model is readily extended to stochastically reproduce the actual generation per period. To evaluate the additive binary Markov chain method, we introduce a coarse model of the electric power system to derive backup and storage needs. We find that the temporal correlations of wind power generation, the backup need as a function of the storage capacity, and the resting time distribution of high and low wind events for different shares of wind generation can be reconstructed.

  12. Frequency-domain reduced order models for gravitational waves from aligned-spin compact binaries

    International Nuclear Information System (INIS)

    Pürrer, Michael

    2014-01-01

    Black-hole binary coalescences are one of the most promising sources for the first detection of gravitational waves. Fast and accurate theoretical models of the gravitational radiation emitted from these coalescences are highly important for the detection and extraction of physical parameters. Spinning effective-one-body models for binaries with aligned-spins have been shown to be highly faithful, but are slow to generate and thus have not yet been used for parameter estimation (PE) studies. I provide a frequency-domain singular value decomposition-based surrogate reduced order model that is thousands of times faster for typical system masses and has a faithfulness mismatch of better than ∼0.1% with the original SEOBNRv1 model for advanced LIGO detectors. This model enables PE studies up to signal-to-noise ratios (SNRs) of 20 and even up to 50 for total masses below 50 M ⊙ . This paper discusses various choices for approximations and interpolation over the parameter space that can be made for reduced order models of spinning compact binaries, provides a detailed discussion of errors arising in the construction and assesses the fidelity of such models. (paper)

  13. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  14. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    Science.gov (United States)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  15. Parameter estimation and statistical test of geographically weighted bivariate Poisson inverse Gaussian regression models

    Science.gov (United States)

    Amalia, Junita; Purhadi, Otok, Bambang Widjanarko

    2017-11-01

    Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.

  16. Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?

    Science.gov (United States)

    Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.

    2016-12-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.

  17. ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL

    Directory of Open Access Journals (Sweden)

    Constantin Anghelache

    2011-11-01

    Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.

  18. Deriving Genomic Breeding Values for Residual Feed Intake from Covariance Functions of Random Regression Models

    DEFF Research Database (Denmark)

    Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne

    2014-01-01

    Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI...

  19. Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

    Science.gov (United States)

    Prahutama, Alan; Sudarno

    2018-05-01

    The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).

  20. The Application of Classical and Neural Regression Models for the Valuation of Residential Real Estate

    Directory of Open Access Journals (Sweden)

    Mach Łukasz

    2017-06-01

    Full Text Available The research process aimed at building regression models, which helps to valuate residential real estate, is presented in the following article. Two widely used computational tools i.e. the classical multiple regression and regression models of artificial neural networks were used in order to build models. An attempt to define the utilitarian usefulness of the above-mentioned tools and comparative analysis of them is the aim of the conducted research. Data used for conducting analyses refers to the secondary transactional residential real estate market.

  1. Linear regression metamodeling as a tool to summarize and present simulation model results.

    Science.gov (United States)

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  2. MESA models for the evolutionary status of the epsilon Aurigae disk-eclipsed binary system

    Science.gov (United States)

    Stencel, Robert E.; Gibson, Justus

    2018-06-01

    The brightest member of the class of disk-eclipsed binary stars is the Algol-like long-period binary, epsilon Aurigae (HD 31964, F0Iap + disk, http://adsabs.harvard.edu/abs/2016SPIE.9907E..17S ). Using MESA (Modules for Experiments in Stellar Astrophysics, version 9575), we have made an evaluation of its evolutionary state. We sought to satisfy several observational constraints, including: (1) requiring evolutionary tracks to pass close to the current temperature and luminosity of the primary star; (2) obtaining a period near the observed value of 27.1 years; (3) matching a mass function of 3.0; (4) concurrent Roche lobe overflow and mass transfer; (5) an isotopic ratio 12C / 13C = 5 and, (6) matching the interferometrically determined angular diameter. A MESA model starting with binary masses of 9.85 + 4.5 solar masses, with a 100 day initial period, produces a 1.2 + 10.6 solar masses result having a 547 day period, plus a single digit 12C / 13C ratio. These values were reached near an age of 20 Myr, when the donor star comes close to the observed luminosity and temperature for epsilon Aurigae A, as a post-RGB/pre-AGB star. Contemporaneously, the accretor then appears as an upper main sequence, early B-type star. This benchmark model can provide a basis for further exploration of this interacting binary, and other long period binary stars. This report has been submitted to MNRAS, along with a parallel investigation of mass transfer stream and disk sub-structure. The authors are grateful to the estate of William Herschel Womble for the support of astronomy at the University of Denver.

  3. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    Science.gov (United States)

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  4. AN X-RAY AND OPTICAL LIGHT CURVE MODEL OF THE ECLIPSING SYMBIOTIC BINARY SMC3

    International Nuclear Information System (INIS)

    Kato, Mariko; Hachisu, Izumi; Mikołajewska, Joanna

    2013-01-01

    Some binary evolution scenarios for Type Ia supernovae (SNe Ia) include long-period binaries that evolve to symbiotic supersoft X-ray sources in their late stage of evolution. However, symbiotic stars with steady hydrogen burning on the white dwarf's (WD) surface are very rare, and the X-ray characteristics are not well known. SMC3 is one such rare example and a key object for understanding the evolution of symbiotic stars to SNe Ia. SMC3 is an eclipsing symbiotic binary, consisting of a massive WD and red giant (RG), with an orbital period of 4.5 years in the Small Magellanic Cloud. The long-term V light curve variations are reproduced as orbital variations in the irradiated RG, whose atmosphere fills its Roche lobe, thus supporting the idea that the RG supplies matter to the WD at rates high enough to maintain steady hydrogen burning on the WD. We also present an eclipse model in which an X-ray-emitting region around the WD is almost totally occulted by the RG swelling over the Roche lobe on the trailing side, although it is always partly obscured by a long spiral tail of neutral hydrogen surrounding the binary in the orbital plane.

  5. 3D Modeling of Accretion Disks and Circumbinary Envelopes in Close Binaries

    Science.gov (United States)

    Bisikalo, D.

    2010-12-01

    A number of observations prove the complex flow structure in close binary stars. The gas dynamic structure of the flow is governed by the stream of matter from the inner Lagrange point, the accretion disk, the circum-disk halo, and the circumbinary envelope. Observations reflect the current state of a binary system and for their interpretation one should consider the gas dynamics of flow patterns. Three-dimensional numerical gasdynamical modeling is used to study the gaseous flow structure and dynamics in close binaries. It is shown that the periodic variations of the positions of the disk and the bow shock formed when the inner parts of the circumbinary envelope flow around the disk result in variations in both the rate of angular-momentum transfer to the disk and the flow structure near the Lagrange point L3. All these factors lead to periodic ejections of matter from the accretion disk and circum-disk halo into the outer layers of the circumbinary envelope. The results of simulations are used to estimate the physical parameters of the circumbinary envelope, including 3D matter distribution in it, and the matter-flow configuration and dynamics. The envelope becomes optically thick for systems with high mass-exchange rates, M⊙=10-8 Msun/year, and has a significant influence on the binary's observed features. The uneven phase distributions of the matter and density variations due to periodic injections of matter into the envelope are important for interpretations of observations of CBSs.

  6. Effect of binary stars on the dynamical evolution of stellar clusters. II. Analytic evolutionary models

    International Nuclear Information System (INIS)

    Hills, J.G.

    1975-01-01

    We use analytic models to compute the evolution of the core of a stellar system due simultaneously to stellar evaporation which causes the system (core) to contract and to its binaries which cause it to expand by progressively decreasing its binding energy. The evolution of the system is determined by two parameters: the initial number of stars in the system N 0 , and the fraction f/subb/ of its stars which are binaries. For a fixed f/subb/, stellar evaporation initially dominates the dynamical evolution if N 0 is sufficiently large due to the fact that the rate of evaporation is determined chiefly by long-range encounters which increase in importance as the number of stars in the system increases. If stellar evaporation initially dominates, the system first contracts, but as N/subc/, the number of remaining stars in the system, decreases by evaporation, the system reaches a minimum radius and a maximum density and then it expands monotonically as N/subc/ decreases further. Open clusters expand monotonically from the beginning if they have anything approaching average Population I binary frequencies. Globular clusters are highly deficient in binaries in order to have formed and retained the high-density stellar cores observed in most of them. We estimate that for these system f/subb/ < or = 0.15

  7. Finite-element solidification modelling of metals and binary alloys

    International Nuclear Information System (INIS)

    Mathew, P.M.

    1986-12-01

    In the Canadian Nuclear Fuel Waste Management Program, cast metals and alloys are being evaluated for their ability to support a metallic fuel waste container shell under disposal vault conditions and to determine their performance as an additional barrier to radionuclide release. These materials would be cast to fill residual free space inside the container and allowed to solidify without major voids. To model their solidification characteristics following casting, a finite-element model, FAXMOD-3, was adopted. Input parameters were modified to account for the latent heat of fusion of the metals and alloys considered. This report describes the development of the solidification model and its theoretical verification. To model the solidification of pure metals and alloys that melt at a distinct temperature, the latent heat of fusion was incorporated as a double-ramp function in the specific heat-temperature relationship, within an interval of +- 1 K around the solidification temperature. Comparison of calculated results for lead, tin and lead-tin eutectic melts, unidirectionally cooled with and without superheat, showed good agreement with an alternative technique called the integral profile method. To model the solidification of alloys that melt over a temperature interval, the fraction of solid in the solid-liquid region, as calculated from the Scheil equation, was used to determine the fraction of latent heat to be liberated over a temperature interval within the solid-liquid zone. Comparison of calculated results for unidirectionally cooled aluminum-4 wt.% copper melt, with and without superheat, showed good agreement with alternative finite-difference techniques

  8. Fault diagnosis and comparing risk for the steel coil manufacturing process using statistical models for binary data

    International Nuclear Information System (INIS)

    Debón, A.; Carlos Garcia-Díaz, J.

    2012-01-01

    Advanced statistical models can help industry to design more economical and rational investment plans. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing. Increasingly stringent quality requirements in the automotive industry also require ongoing efforts in process control to make processes more robust. Robust methods for estimating the quality of galvanized steel coils are an important tool for the comprehensive monitoring of the performance of the manufacturing process. This study applies different statistical regression models: generalized linear models, generalized additive models and classification trees to estimate the quality of galvanized steel coils on the basis of short time histories. The data, consisting of 48 galvanized steel coils, was divided into sets of conforming and nonconforming coils. Five variables were selected for monitoring the process: steel strip velocity and four bath temperatures. The present paper reports a comparative evaluation of statistical models for binary data using Receiver Operating Characteristic (ROC) curves. A ROC curve is a graph or a technique for visualizing, organizing and selecting classifiers based on their performance. The purpose of this paper is to examine their use in research to obtain the best model to predict defective steel coil probability. In relation to the work of other authors who only propose goodness of fit statistics, we should highlight one distinctive feature of the methodology presented here, which is the possibility of comparing the different models with ROC graphs which are based on model classification performance. Finally, the results are validated by bootstrap procedures.

  9. A Conditional Curie-Weiss Model for Stylized Multi-group Binary Choice with Social Interaction

    Science.gov (United States)

    Opoku, Alex Akwasi; Edusei, Kwame Owusu; Ansah, Richard Kwame

    2018-04-01

    This paper proposes a conditional Curie-Weiss model as a model for decision making in a stylized society made up of binary decision makers that face a particular dichotomous choice between two options. Following Brock and Durlauf (Discrete choice with social interaction I: theory, 1955), we set-up both socio-economic and statistical mechanical models for the choice problem. We point out when both the socio-economic and statistical mechanical models give rise to the same self-consistent equilibrium mean choice level(s). Phase diagram of the associated statistical mechanical model and its socio-economic implications are discussed.

  10. Modeling Tetanus Neonatorum case using the regression of negative binomial and zero-inflated negative binomial

    Science.gov (United States)

    Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni

    2017-12-01

    Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.

  11. Poisson regression for modeling count and frequency outcomes in trauma research.

    Science.gov (United States)

    Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T

    2008-10-01

    The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.

  12. Observational constraints from models of close binary evolution

    International Nuclear Information System (INIS)

    Greve, J.P. de; Packet, W.

    1984-01-01

    The evolution of a system of 9 solar masses + 5.4 solar masses is computed from Zero Age Main Sequence through an early case B of mass exchange, up to the second phase of mass transfer after core helium burning. Both components are calculated simultaneously. The evolution is divided into several physically different phases. The characteristics of the models in each of these phases are transformed into corresponding 'observable' quantities. The outlook of the system for photometric observations is discussed, for an idealized case. The influence of the mass of the loser and the initial mass ratio is considered. (Auth.)

  13. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  14. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  15. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  16. Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models

    Energy Technology Data Exchange (ETDEWEB)

    Pappas, S.S. [Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83 200 Samos (Greece); Ekonomou, L.; Chatzarakis, G.E. [Department of Electrical Engineering Educators, ASPETE - School of Pedagogical and Technological Education, N. Heraklion, 141 21 Athens (Greece); Karamousantas, D.C. [Technological Educational Institute of Kalamata, Antikalamos, 24100 Kalamata (Greece); Katsikas, S.K. [Department of Technology Education and Digital Systems, University of Piraeus, 150 Androutsou Srt., 18 532 Piraeus (Greece); Liatsis, P. [Division of Electrical Electronic and Information Engineering, School of Engineering and Mathematical Sciences, Information and Biomedical Engineering Centre, City University, Northampton Square, London EC1V 0HB (United Kingdom)

    2008-09-15

    This study addresses the problem of modeling the electricity demand loads in Greece. The provided actual load data is deseasonilized and an AutoRegressive Moving Average (ARMA) model is fitted on the data off-line, using the Akaike Corrected Information Criterion (AICC). The developed model fits the data in a successful manner. Difficulties occur when the provided data includes noise or errors and also when an on-line/adaptive modeling is required. In both cases and under the assumption that the provided data can be represented by an ARMA model, simultaneous order and parameter estimation of ARMA models under the presence of noise are performed. The produced results indicate that the proposed method, which is based on the multi-model partitioning theory, tackles successfully the studied problem. For validation purposes the produced results are compared with three other established order selection criteria, namely AICC, Akaike's Information Criterion (AIC) and Schwarz's Bayesian Information Criterion (BIC). The developed model could be useful in the studies that concern electricity consumption and electricity prices forecasts. (author)

  17. Developing and testing a global-scale regression model to quantify mean annual streamflow

    Science.gov (United States)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  18. Binary collisions in popovici’s photogravitational model

    Directory of Open Access Journals (Sweden)

    Mioc V.

    2002-01-01

    Full Text Available The dynamics of bodies under the combined action of the gravitational attraction and the radiative repelling force has large and deep implications in astronomy. In the 1920s, the Romanian astronomer Constantin Popovici proposed a modified photogravitational law (considered by other scientists too. This paper deals with the collisions of the two-body problem associated with Popovici’s model. Resorting to McGehee-type transformations of the second kind, we obtain regular equations of motion and define the collision manifold. The flow on this boundary manifold is wholly described. This allows to point out some important qualitative features of the collisional motion: existence of the black-hole effect, gradientlikeness of the flow on the collision manifold, regularizability of collisions under certain conditions. Some questions, coming from the comparison of Levi-Civita’s regularizing transformations and McGehee’s ones, are formulated.

  19. Structural models for amorphous transition metal binary alloys

    International Nuclear Information System (INIS)

    Ching, W.Y.; Lin, C.C.

    1976-01-01

    A dense random packing of 445 hard spheres with two different diameters in a concentration ratio of 3 : 1 was hand-built to simulate the structure of amorphous transition metal-metalloid alloys. By introducing appropriate pair potentials of the Lennard-Jones type, the structure is dynamically relaxed by minimizing the total energy. The radial distribution functions (RDF) for amorphous Fe 0 . 75 P 0 . 25 , Ni 0 . 75 P 0 . 25 , Co 0 . 75 P 0 . 25 are obtained and compared with the experimental data. The calculated RDF's are resolved into their partial components. The results indicate that such dynamically constructed models are capable of accounting for some subtle features in the RDF of amorphous transition metal-metalloid alloys

  20. No tension between assembly models of super massive black hole binaries and pulsar observations.

    Science.gov (United States)

    Middleton, Hannah; Chen, Siyuan; Del Pozzo, Walter; Sesana, Alberto; Vecchio, Alberto

    2018-02-08

    Pulsar timing arrays are presently the only means to search for the gravitational wave stochastic background from super massive black hole binary populations, considered to be within the grasp of current or near-future observations. The stringent upper limit from the Parkes Pulsar Timing Array has been interpreted as excluding (>90% confidence) the current paradigm of binary assembly through galaxy mergers and hardening via stellar interaction, suggesting evolution is accelerated or stalled. Using Bayesian hierarchical modelling we consider implications of this upper limit for a range of astrophysical scenarios, without invoking stalling, nor more exotic physical processes. All scenarios are fully consistent with the upper limit, but (weak) bounds on population parameters can be inferred. Recent upward revisions of the black hole-galaxy bulge mass relation are disfavoured at 1.6σ against lighter models. Once sensitivity improves by an order of magnitude, a non-detection will disfavour the most optimistic scenarios at 3.9σ.

  1. Reflexion on linear regression trip production modelling method for ensuring good model quality

    Science.gov (United States)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  2. Using the Logistic Regression model in supporting decisions of establishing marketing strategies

    Directory of Open Access Journals (Sweden)

    Cristinel CONSTANTIN

    2015-12-01

    Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations

  3. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  4. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  5. The use of continuous data versus binary data in MTC models: a case study in rheumatoid arthritis.

    Science.gov (United States)

    Schmitz, Susanne; Adams, Roisin; Walsh, Cathal

    2012-11-06

    Estimates of relative efficacy between alternative treatments are crucial for decision making in health care. When sufficient head to head evidence is not available Bayesian mixed treatment comparison models provide a powerful methodology to obtain such estimates. While models can be fit to a broad range of efficacy measures, this paper illustrates the advantages of using continuous outcome measures compared to binary outcome measures. Using a case study in rheumatoid arthritis a Bayesian mixed treatment comparison model is fit to estimate the relative efficacy of five anti-TNF agents currently licensed in Europe. The model is fit for the continuous HAQ improvement outcome measure and a binary version thereof as well as for the binary ACR response measure and the underlying continuous effect. Results are compared regarding their power to detect differences between treatments. Sixteen randomized controlled trials were included for the analysis. For both analyses, based on the HAQ improvement as well as based on the ACR response, differences between treatments detected by the binary outcome measures are subsets of the differences detected by the underlying continuous effects. The information lost when transforming continuous data into a binary response measure translates into a loss of power to detect differences between treatments in mixed treatment comparison models. Binary outcome measures are therefore less sensitive to change than continuous measures. Furthermore the choice of cut-off point to construct the binary measure also impacts the relative efficacy estimates.

  6. The use of continuous data versus binary data in MTC models: A case study in rheumatoid arthritis

    Directory of Open Access Journals (Sweden)

    Schmitz Susanne

    2012-11-01

    Full Text Available Abstract Background Estimates of relative efficacy between alternative treatments are crucial for decision making in health care. When sufficient head to head evidence is not available Bayesian mixed treatment comparison models provide a powerful methodology to obtain such estimates. While models can be fit to a broad range of efficacy measures, this paper illustrates the advantages of using continuous outcome measures compared to binary outcome measures. Methods Using a case study in rheumatoid arthritis a Bayesian mixed treatment comparison model is fit to estimate the relative efficacy of five anti-TNF agents currently licensed in Europe. The model is fit for the continuous HAQ improvement outcome measure and a binary version thereof as well as for the binary ACR response measure and the underlying continuous effect. Results are compared regarding their power to detect differences between treatments. Results Sixteen randomized controlled trials were included for the analysis. For both analyses, based on the HAQ improvement as well as based on the ACR response, differences between treatments detected by the binary outcome measures are subsets of the differences detected by the underlying continuous effects. Conclusions The information lost when transforming continuous data into a binary response measure translates into a loss of power to detect differences between treatments in mixed treatment comparison models. Binary outcome measures are therefore less sensitive to change than continuous measures. Furthermore the choice of cut-off point to construct the binary measure also impacts the relative efficacy estimates.

  7. A 3D dynamical model of the colliding winds in binary systems

    Science.gov (United States)

    Parkin, E. R.; Pittard, J. M.

    2008-08-01

    We present a three-dimensional (3D) dynamical model of the orbital-induced curvature of the wind-wind collision region in binary star systems. Momentum balance equations are used to determine the position and shape of the contact discontinuity between the stars, while further downstream the gas is assumed to behave ballistically. An Archimedean spiral structure is formed by the motion of the stars, with clear resemblance to high-resolution images of the so-called `pinwheel nebulae'. A key advantage of this approach over grid or smoothed particle hydrodynamic models is its significantly reduced computational cost, while it also allows the study of the structure obtained in an eccentric orbit. The model is relevant to symbiotic systems and γ-ray binaries, as well as systems with O-type and Wolf-Rayet stars. As an example application, we simulate the X-ray emission from hypothetical O+O and WR+O star binaries, and describe a method of ray tracing through the 3D spiral structure to account for absorption by the circumstellar material in the system. Such calculations may be easily adapted to study observations at wavelengths ranging from the radio to γ-ray.

  8. Development of an empirical model of turbine efficiency using the Taylor expansion and regression analysis

    International Nuclear Information System (INIS)

    Fang, Xiande; Xu, Yu

    2011-01-01

    The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination R c 2 ≥ 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: → Performed a critical review of empirical models of turbine efficiency. → Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. → Verified the method for developing the empirical model. → Verified the model.

  9. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    Science.gov (United States)

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  10. Time variability of X-ray binaries: observations with INTEGRAL. Modeling

    International Nuclear Information System (INIS)

    Cabanac, Clement

    2007-01-01

    The exact origin of the observed X and Gamma ray variability in X-ray binaries is still an open debate in high energy astrophysics. Among others, these objects are showing aperiodic and quasi-periodic luminosity variations on timescales as small as the millisecond. This erratic behavior must put constraints on the proposed emission processes occurring in the vicinity of the neutrons star or the stellar mass black-hole held by these objects. We propose here to study their behavior following 3 different ways: first we examine the evolution of a particular X-ray source discovered by INTEGRAL, IGR J19140+0951. Using timing and spectral data given by different instruments, we show that the source type is plausibly consistent with a High Mass X-ray Binary hosting a neutrons star. Subsequently, we propose a new method dedicated to the study of timing data coming from coded mask aperture instruments. Using it on INTEGRAL/ISGRI real data, we detect the presence of periodic and quasi-periodic features in some pulsars and micro-quasars at energies as high as a hundred keV. Finally, we suggest a model designed to describe the low frequency variability of X-ray binaries in their hardest state. This model is based on thermal comptonization of soft photons by a warm corona in which a pressure wave is propagating in cylindrical geometry. By computing both numerical simulations and analytical solution, we show that this model should be suitable to describe some of the typical features observed in X-ray binaries power spectra in their hard state and their evolution such as aperiodic noise and low frequency quasi-periodic oscillations. (author) [fr

  11. Use of linear regression for the processing of curves of differential potentiometric titration of a binary mixture of heterovalent ions using precipitation reactions

    International Nuclear Information System (INIS)

    Mar'yanov, B.M.; Zarubin, A.G.; Shumar, S.V.

    2003-01-01

    A method is proposed for the computer processing of curve of differential potentiometric titration of a binary mixture of heterovalent ions using precipitation reactions. The method is based on the transformation of the titration curve to segment-line characteristics, whose parameters (within the accuracy of the least-squares method) determine the sequence of the equivalence points and solubility products of the resulting precipitation. The method is applied to the titration of Ag(I)-Cd)II), Hg(II)-Te(IV), and Cd(II)-Te(IV) mixtures by a sodium diethyldithiocarbamate solution with membrane sulfide and glassy carbon indicator electrodes. For 4 to 11 mg of the analyte in 50 ml of the solution, RSD varies from 1 to 9% [ru

  12. The Norwegian Healthier Goats program--modeling lactation curves using a multilevel cubic spline regression model.

    Science.gov (United States)

    Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S

    2014-07-01

    In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014

  13. Profile-driven regression for modeling and runtime optimization of mobile networks

    DEFF Research Database (Denmark)

    McClary, Dan; Syrotiuk, Violet; Kulahci, Murat

    2010-01-01

    Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike...

  14. Regression models for interval censored survival data: Application to HIV infection in Danish homosexual men

    DEFF Research Database (Denmark)

    Carstensen, Bendix

    1996-01-01

    This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....

  15. The Relationship between Economic Growth and Money Laundering – a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Daniel Rece

    2009-09-01

    Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.

  16. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  17. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  18. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...... the age-dependent or antagonistic pleiotropic effects. The models are applied to HFE genotype data to assess the effects on human longevity by different alleles and to detect if an age-dependent effect exists. Application has shown that these methods can serve as useful tools in searching for important...

  19. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    Science.gov (United States)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  20. A re-examination of thermodynamic modelling of U-Ru binary phase diagram

    Energy Technology Data Exchange (ETDEWEB)

    Wang, L.C.; Kaye, M.H., E-mail: matthew.kaye@uoit.ca [University of Ontario Institute of Technology, Oshawa, ON (Canada)

    2015-07-01

    Ruthenium (Ru) is one of the more abundant fission products (FPs) both in fast breeder reactors and thermal reactors. Post irradiation examinations (PIE) show that both 'the white metallic phase' (MoTc-Ru-Rh-Pd) and 'the other metallic phase' (U(Pd-Rh-Ru)3) are present in spent nuclear fuels. To describe this quaternary system, binary subsystems of uranium (U) with Pd, Rh, and Ru are necessary. Presently, only the U-Ru system has been thermodynamically described but with some problems. As part of research on U-Ru-Rh-Pd quaternary system, an improved consistent thermodynamic model describing the U-Ru binary phase diagram has been obtained. (author)

  1. Measurement and modelling of hydrogen bonding in 1-alkanol plus n-alkane binary mixtures

    DEFF Research Database (Denmark)

    von Solms, Nicolas; Jensen, Lars; Kofod, Jonas L.

    2007-01-01

    Two equations of state (simplified PC-SAFT and CPA) are used to predict the monomer fraction of 1-alkanols in binary mixtures with n-alkanes. It is found that the choice of parameters and association schemes significantly affects the ability of a model to predict hydrogen bonding in mixtures, eve...... studies, which is clarified in the present work. New hydrogen bonding data based on infrared spectroscopy are reported for seven binary mixtures of alcohols and alkanes. (C) 2007 Elsevier B.V. All rights reserved....... though pure-component liquid densities and vapour pressures are predicted equally accurately for the associating compound. As was the case in the study of pure components, there exists some confusion in the literature about the correct interpretation and comparison of experimental data and theoretical...

  2. Quantifying relative importance: Computing standardized effects in models with binary outcomes

    Science.gov (United States)

    Grace, James B.; Johnson, Darren; Lefcheck, Jonathan S.; Byrnes, Jarrett E.K.

    2018-01-01

    Scientists commonly ask questions about the relative importances of processes, and then turn to statistical models for answers. Standardized coefficients are typically used in such situations, with the goal being to compare effects on a common scale. Traditional approaches to obtaining standardized coefficients were developed with idealized Gaussian variables in mind. When responses are binary, complications arise that impact standardization methods. In this paper, we review, evaluate, and propose new methods for standardizing coefficients from models that contain binary outcomes. We first consider the interpretability of unstandardized coefficients and then examine two main approaches to standardization. One approach, which we refer to as the Latent-Theoretical or LT method, assumes that underlying binary observations there exists a latent, continuous propensity linearly related to the coefficients. A second approach, which we refer to as the Observed-Empirical or OE method, assumes responses are purely discrete and estimates error variance empirically via reference to a classical R2 estimator. We also evaluate the standard formula for calculating standardized coefficients based on standard deviations. Criticisms of this practice have been persistent, leading us to propose an alternative formula that is based on user-defined “relevant ranges”. Finally, we implement all of the above in an open-source package for the statistical software R.

  3. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  4. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2015-04-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  5. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2014-06-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  6. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  7. Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils

    International Nuclear Information System (INIS)

    Sonneveld, M.P.W.; Brus, D.J.; Roelsma, J.

    2010-01-01

    For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region dominated by dairy farming. No data from this area were used for calibrating the regression models. The model was validated by additional probability sampling. This sample was used to estimate errors in 1) the predicted areal fractions where the EU standard of 50 mg l -1 is exceeded for farms with low N surpluses (ALT) and farms with higher N surpluses (REF); 2) predicted cumulative frequency distributions of nitrate concentration for both groups of farms. Both the errors in the predicted areal fractions as well as the errors in the predicted cumulative frequency distributions indicate that the regression models are invalid for the sandy soils of this study area. - This study indicates that linear regression models that predict nitrate concentrations in the upper groundwater using residual soil N contents should be applied with care.

  8. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    OpenAIRE

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...

  9. A computational approach to compare regression modelling strategies in prediction research.

    Science.gov (United States)

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  10. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    Science.gov (United States)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  11. Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach.

    Science.gov (United States)

    Chen, Yong; Hong, Chuan; Ning, Yang; Su, Xiao

    2016-01-15

    When conducting a meta-analysis of studies with bivariate binary outcomes, challenges arise when the within-study correlation and between-study heterogeneity should be taken into account. In this paper, we propose a marginal beta-binomial model for the meta-analysis of studies with binary outcomes. This model is based on the composite likelihood approach and has several attractive features compared with the existing models such as bivariate generalized linear mixed model (Chu and Cole, 2006) and Sarmanov beta-binomial model (Chen et al., 2012). The advantages of the proposed marginal model include modeling the probabilities in the original scale, not requiring any transformation of probabilities or any link function, having closed-form expression of likelihood function, and no constraints on the correlation parameter. More importantly, because the marginal beta-binomial model is only based on the marginal distributions, it does not suffer from potential misspecification of the joint distribution of bivariate study-specific probabilities. Such misspecification is difficult to detect and can lead to biased inference using currents methods. We compare the performance of the marginal beta-binomial model with the bivariate generalized linear mixed model and the Sarmanov beta-binomial model by simulation studies. Interestingly, the results show that the marginal beta-binomial model performs better than the Sarmanov beta-binomial model, whether or not the true model is Sarmanov beta-binomial, and the marginal beta-binomial model is more robust than the bivariate generalized linear mixed model under model misspecifications. Two meta-analyses of diagnostic accuracy studies and a meta-analysis of case-control studies are conducted for illustration. Copyright © 2015 John Wiley & Sons, Ltd.

  12. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  13. Modeling and prediction of Turkey's electricity consumption using Support Vector Regression

    International Nuclear Information System (INIS)

    Kavaklioglu, Kadir

    2011-01-01

    Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)

  14. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    Science.gov (United States)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  15. A metallic solution model with adjustable parameter for describing ternary thermodynamic properties from its binary constituents

    International Nuclear Information System (INIS)

    Fang Zheng; Qiu Guanzhou

    2007-01-01

    A metallic solution model with adjustable parameter k has been developed to predict thermodynamic properties of ternary systems from those of its constituent three binaries. In the present model, the excess Gibbs free energy for a ternary mixture is expressed as a weighted probability sum of those of binaries and the k value is determined based on an assumption that the ternary interaction generally strengthens the mixing effects for metallic solutions with weak interaction, making the Gibbs free energy of mixing of the ternary system more negative than that before considering the interaction. This point is never considered in the models currently reported, where the only difference in a geometrical definition of molar values of components is considered that do not involve thermodynamic principles but are completely empirical. The current model describes the results of experiments very well, and by adjusting the k value also agrees with those from models used widely in the literature. Three ternary systems, Mg-Cu-Ni, Zn-In-Cd, and Cd-Bi-Pb are recalculated to demonstrate the method of determining k and the precision of the model. The results of the calculations, especially those in Mg-Cu-Ni system, are better than those predicted by the current models in the literature

  16. Blind Separation of Acoustic Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking

    Directory of Open Access Journals (Sweden)

    Hiekata Takashi

    2006-01-01

    Full Text Available A new two-stage blind source separation (BSS method for convolutive mixtures of speech is proposed, in which a single-input multiple-output (SIMO-model-based independent component analysis (ICA and a new SIMO-model-based binary masking are combined. SIMO-model-based ICA enables us to separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources in their original form at the microphones. Thus, the separated signals of SIMO-model-based ICA can maintain the spatial qualities of each sound source. Owing to this attractive property, our novel SIMO-model-based binary masking can be applied to efficiently remove the residual interference components after SIMO-model-based ICA. The experimental results reveal that the separation performance can be considerably improved by the proposed method compared with that achieved by conventional BSS methods. In addition, the real-time implementation of the proposed BSS is illustrated.

  17. New approach in modeling Cr(VI) sorption onto biomass from metal binary mixtures solutions

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Chang [College of Environmental Science and Engineering, Anhui Normal University, South Jiuhua Road, 189, 241002 Wuhu (China); Chemical Engineering Department, Escola Politècnica Superior, Universitat de Girona, Ma Aurèlia Capmany, 61, 17071 Girona (Spain); Fiol, Núria [Chemical Engineering Department, Escola Politècnica Superior, Universitat de Girona, Ma Aurèlia Capmany, 61, 17071 Girona (Spain); Villaescusa, Isabel, E-mail: Isabel.Villaescusa@udg.edu [Chemical Engineering Department, Escola Politècnica Superior, Universitat de Girona, Ma Aurèlia Capmany, 61, 17071 Girona (Spain); Poch, Jordi [Applied Mathematics Department, Escola Politècnica Superior, Universitat de Girona, Ma Aurèlia Capmany, 61, 17071 Girona (Spain)

    2016-01-15

    In the last decades Cr(VI) sorption equilibrium and kinetic studies have been carried out using several types of biomasses. However there are few researchers that consider all the simultaneous processes that take place during Cr(VI) sorption (i.e., sorption/reduction of Cr(VI) and simultaneous formation and binding of reduced Cr(III)) when formulating a model that describes the overall sorption process. On the other hand Cr(VI) scarcely exists alone in wastewaters, it is usually found in mixtures with divalent metals. Therefore, the simultaneous removal of Cr(VI) and divalent metals in binary mixtures and the interactive mechanism governing Cr(VI) elimination have gained more and more attention. In the present work, kinetics of Cr(VI) sorption onto exhausted coffee from Cr(VI)–Cu(II) binary mixtures has been studied in a stirred batch reactor. A model including Cr(VI) sorption and reduction, Cr(III) sorption and the effect of the presence of Cu(II) in these processes has been developed and validated. This study constitutes an important advance in modeling Cr(VI) sorption kinetics especially when chromium sorption is in part based on the sorbent capacity of reducing hexavalent chromium and a metal cation is present in the binary mixture. - Highlights: • A kinetic model including Cr(VI) reduction, Cr(VI) and Cr(III) sorption/desorption • Synergistic effect of Cu(II) on Cr(VI) elimination included in the modelModel validation by checking it against independent sets of data.

  18. New approach in modeling Cr(VI) sorption onto biomass from metal binary mixtures solutions

    International Nuclear Information System (INIS)

    Liu, Chang; Fiol, Núria; Villaescusa, Isabel; Poch, Jordi

    2016-01-01

    In the last decades Cr(VI) sorption equilibrium and kinetic studies have been carried out using several types of biomasses. However there are few researchers that consider all the simultaneous processes that take place during Cr(VI) sorption (i.e., sorption/reduction of Cr(VI) and simultaneous formation and binding of reduced Cr(III)) when formulating a model that describes the overall sorption process. On the other hand Cr(VI) scarcely exists alone in wastewaters, it is usually found in mixtures with divalent metals. Therefore, the simultaneous removal of Cr(VI) and divalent metals in binary mixtures and the interactive mechanism governing Cr(VI) elimination have gained more and more attention. In the present work, kinetics of Cr(VI) sorption onto exhausted coffee from Cr(VI)–Cu(II) binary mixtures has been studied in a stirred batch reactor. A model including Cr(VI) sorption and reduction, Cr(III) sorption and the effect of the presence of Cu(II) in these processes has been developed and validated. This study constitutes an important advance in modeling Cr(VI) sorption kinetics especially when chromium sorption is in part based on the sorbent capacity of reducing hexavalent chromium and a metal cation is present in the binary mixture. - Highlights: • A kinetic model including Cr(VI) reduction, Cr(VI) and Cr(III) sorption/desorption • Synergistic effect of Cu(II) on Cr(VI) elimination included in the modelModel validation by checking it against independent sets of data

  19. On pseudo-values for regression analysis in competing risks models

    DEFF Research Database (Denmark)

    Graw, F; Gerds, Thomas Alexander; Schumacher, M

    2009-01-01

    For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding their...

  20. A Predictive Logistic Regression Model of World Conflict Using Open Source Data

    Science.gov (United States)

    2015-03-26

    No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not

  1. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  2. Fitting multistate transition models with autoregressive logistic regression : Supervised exercise in intermittent claudication

    NARCIS (Netherlands)

    de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M

    1998-01-01

    The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a

  3. Endogenous glucose production from infancy to adulthood: a non-linear regression model

    NARCIS (Netherlands)

    Huidekoper, Hidde H.; Ackermans, Mariëtte T.; Ruiter, An F. C.; Sauerwein, Hans P.; Wijburg, Frits A.

    2014-01-01

    To construct a regression model for endogenous glucose production (EGP) as a function of age, and compare this with glucose supplementation using commonly used dextrose-based saline solutions at fluid maintenance rate in children. A model was constructed based on EGP data, as quantified by

  4. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  5. Sensitivity analysis and optimization of system dynamics models : Regression analysis and statistical design of experiments

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    1995-01-01

    This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for

  6. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  7. Logistic regression models of factors influencing the location of bioenergy and biofuels plants

    Science.gov (United States)

    T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu

    2011-01-01

    Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...

  8. Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

    Science.gov (United States)

    Nikbakht, Roya; Bahrampour, Abbas

    2017-01-01

    Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.

  9. Photovoltaic Array Condition Monitoring Based on Online Regression of Performance Model

    DEFF Research Database (Denmark)

    Spataru, Sergiu; Sera, Dezso; Kerekes, Tamas

    2013-01-01

    regression modeling, from PV array production, plane-of-array irradiance, and module temperature measurements, acquired during an initial learning phase of the system. After the model has been parameterized automatically, the condition monitoring system enters the normal operation phase, where...

  10. The use of logistic regression in modelling the distributions of bird ...

    African Journals Online (AJOL)

    The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from bird atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...

  11. Time series modeling by a regression approach based on a latent process.

    Science.gov (United States)

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  12. A LATENT CLASS POISSON REGRESSION-MODEL FOR HETEROGENEOUS COUNT DATA

    NARCIS (Netherlands)

    WEDEL, M; DESARBO, WS; BULT, [No Value; RAMASWAMY, [No Value

    1993-01-01

    In this paper an approach is developed that accommodates heterogeneity in Poisson regression models for count data. The model developed assumes that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables. We assume that the mixing

  13. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection of n...

  14. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media

    DEFF Research Database (Denmark)

    Cooley, R.L.; Christensen, Steen

    2006-01-01

    small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate θ* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear...... are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis....

  16. Hadronic model for the non-thermal radiation from the binary system AR Scorpii

    Science.gov (United States)

    Bednarek, W.

    2018-05-01

    AR Scorpii is a close binary system containing a rotation powered white dwarf and a low-mass M type companion star. This system shows non-thermal emission extending up to the X-ray energy range. We consider hybrid (lepto-hadronic) and pure hadronic models for the high energy non-thermal processes in this binary system. Relativistic electrons and hadrons are assumed to be accelerated in a strongly magnetised, turbulent region formed in collision of a rotating white dwarf magnetosphere and a magnetosphere/dense atmosphere of the M-dwarf star. We propose that the non-thermal X-ray emission is produced either by the primary electrons or the secondary e± pairs from decay of charged pions created in collisions of hadrons with the companion star atmosphere. We show that the accompanying γ-ray emission from decay of neutral pions, which are produced by these same protons, is expected to be on the detectability level of the present and/or the future satellite and Cherenkov telescopes. The γ-ray observations of the binary system AR Sco should allow us to constrain the efficiency of hadron and electron acceleration and also the details of the radiation processes.

  17. Comparison of many bodied and binary collision cascade models up to 1 keV

    International Nuclear Information System (INIS)

    Schwartz, D.M.; Schiffgens, J.D.; Doran, D.G.; Odette, G.R.; Ariyasu, R.G.

    1976-01-01

    A quasi-dynamical code ADDES has been developed to model displacement cascades in copper for primary knockon atom energies up to several keV. ADDES is like a dynamical code in that it employs a many body treatment, yet similar to a binary collision code in that it incorporates the basic assumption that energy transfers below several eV can be ignored in describing cascade evolution. This paper is primarily concerned with (1) a continuing effort to validate the assumptions and specific parameters in the code by the comparison of ADDES results with experiment and with results from a dynamical code, and (2) comparisons of ADDES results with those from a binary collision code. The directional dependence of the displacement threshold is in reasonable agreement with the measurements of Jung et al. The behavior of focused replacement sequences is very similar to that obtained with the dynamical codes GRAPE and COMENT. Qualitative agreement was found between ADDES and COMENT for a higher energy (500 eV) defocused event while differences, still under study, are apparent in a 250 eV high index event. Comparisons of ADDES with the binary collision code MARLOWE show surprisingly good agreement in the 250 to 1000 eV range for both number and separation of Frenkel pairs. A preliminary observation, perhaps significant to displacement calculations utilizing the concept of a mean displacement energy, is the dissipation of 300 to 400 eV in a replacement sequence producing a single interstitial

  18. Longitudinal beta regression models for analyzing health-related quality of life scores over time

    Directory of Open Access Journals (Sweden)

    Hunger Matthias

    2012-09-01

    Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.

  19. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  20. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    Science.gov (United States)

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  1. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  2. An odor interaction model of binary odorant mixtures by a partial differential equation method.

    Science.gov (United States)

    Yan, Luchun; Liu, Jiemin; Wang, Guihua; Wu, Chuandong

    2014-07-09

    A novel odor interaction model was proposed for binary mixtures of benzene and substituted benzenes by a partial differential equation (PDE) method. Based on the measurement method (tangent-intercept method) of partial molar volume, original parameters of corresponding formulas were reasonably displaced by perceptual measures. By these substitutions, it was possible to relate a mixture's odor intensity to the individual odorant's relative odor activity value (OAV). Several binary mixtures of benzene and substituted benzenes were respectively tested to establish the PDE models. The obtained results showed that the PDE model provided an easily interpretable method relating individual components to their joint odor intensity. Besides, both predictive performance and feasibility of the PDE model were proved well through a series of odor intensity matching tests. If combining the PDE model with portable gas detectors or on-line monitoring systems, olfactory evaluation of odor intensity will be achieved by instruments instead of odor assessors. Many disadvantages (e.g., expense on a fixed number of odor assessors) also will be successfully avoided. Thus, the PDE model is predicted to be helpful to the monitoring and management of odor pollutions.

  3. Reconstruction of binary geological images using analytical edge and object models

    Science.gov (United States)

    Abdollahifard, Mohammad J.; Ahmadi, Sadegh

    2016-04-01

    Reconstruction of fields using partial measurements is of vital importance in different applications in geosciences. Solving such an ill-posed problem requires a well-chosen model. In recent years, training images (TI) are widely employed as strong prior models for solving these problems. However, in the absence of enough evidence it is difficult to find an adequate TI which is capable of describing the field behavior properly. In this paper a very simple and general model is introduced which is applicable to a fairly wide range of binary images without any modifications. The model is motivated by the fact that nearly all binary images are composed of simple linear edges in micro-scale. The analytic essence of this model allows us to formulate the template matching problem as a convex optimization problem having efficient and fast solutions. The model has the potential to incorporate the qualitative and quantitative information provided by geologists. The image reconstruction problem is also formulated as an optimization problem and solved using an iterative greedy approach. The proposed method is capable of recovering the image unknown values with accuracies about 90% given samples representing as few as 2% of the original image.

  4. An Odor Interaction Model of Binary Odorant Mixtures by a Partial Differential Equation Method

    Directory of Open Access Journals (Sweden)

    Luchun Yan

    2014-07-01

    Full Text Available A novel odor interaction model was proposed for binary mixtures of benzene and substituted benzenes by a partial differential equation (PDE method. Based on the measurement method (tangent-intercept method of partial molar volume, original parameters of corresponding formulas were reasonably displaced by perceptual measures. By these substitutions, it was possible to relate a mixture’s odor intensity to the individual odorant’s relative odor activity value (OAV. Several binary mixtures of benzene and substituted benzenes were respectively tested to establish the PDE models. The obtained results showed that the PDE model provided an easily interpretable method relating individual components to their joint odor intensity. Besides, both predictive performance and feasibility of the PDE model were proved well through a series of odor intensity matching tests. If combining the PDE model with portable gas detectors or on-line monitoring systems, olfactory evaluation of odor intensity will be achieved by instruments instead of odor assessors. Many disadvantages (e.g., expense on a fixed number of odor assessors also will be successfully avoided. Thus, the PDE model is predicted to be helpful to the monitoring and management of odor pollutions.

  5. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models.

    Science.gov (United States)

    Blackman, Jonathan; Field, Scott E; Galley, Chad R; Szilágyi, Béla; Scheel, Mark A; Tiglio, Manuel; Hemberger, Daniel A

    2015-09-18

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic _{-2}Y_{ℓm} waveform modes resolved by the NR code up to ℓ=8. We compare our surrogate model to effective one body waveforms from 50M_{⊙} to 300M_{⊙} for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases).

  6. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    Science.gov (United States)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  7. Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression

    Directory of Open Access Journals (Sweden)

    Li Jian

    2017-01-01

    Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.

  8. Model-based bootstrapping when correcting for measurement error with application to logistic regression.

    Science.gov (United States)

    Buonaccorsi, John P; Romeo, Giovanni; Thoresen, Magne

    2018-03-01

    When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods. © 2017, The International Biometric Society.

  9. Multiple regression models for energy use in air-conditioned office buildings in different climates

    International Nuclear Information System (INIS)

    Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.

    2010-01-01

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  10. SPSS macros to compare any two fitted values from a regression model.

    Science.gov (United States)

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  11. Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner

    Directory of Open Access Journals (Sweden)

    Luciano Fanton

    2012-01-01

    Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.

  12. Regression analysis of informative current status data with the additive hazards model.

    Science.gov (United States)

    Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo

    2015-04-01

    This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.

  13. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  14. Soft-sensing model of temperature for aluminum reduction cell on improved twin support vector regression

    Science.gov (United States)

    Li, Tao

    2018-06-01

    The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.

  15. Theoretical Models of Protostellar Binary and Multiple Systems with AMR Simulations

    Science.gov (United States)

    Matsumoto, Tomoaki; Tokuda, Kazuki; Onishi, Toshikazu; Inutsuka, Shu-ichiro; Saigo, Kazuya; Takakuwa, Shigehisa

    2017-05-01

    We present theoretical models for protostellar binary and multiple systems based on the high-resolution numerical simulation with an adaptive mesh refinement (AMR) code, SFUMATO. The recent ALMA observations have revealed early phases of the binary and multiple star formation with high spatial resolutions. These observations should be compared with theoretical models with high spatial resolutions. We present two theoretical models for (1) a high density molecular cloud core, MC27/L1521F, and (2) a protobinary system, L1551 NE. For the model for MC27, we performed numerical simulations for gravitational collapse of a turbulent cloud core. The cloud core exhibits fragmentation during the collapse, and dynamical interaction between the fragments produces an arc-like structure, which is one of the prominent structures observed by ALMA. For the model for L1551 NE, we performed numerical simulations of gas accretion onto protobinary. The simulations exhibit asymmetry of a circumbinary disk. Such asymmetry has been also observed by ALMA in the circumbinary disk of L1551 NE.

  16. Combination of supervised and semi-supervised regression models for improved unbiased estimation

    DEFF Research Database (Denmark)

    Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan

    2010-01-01

    In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised...

  17. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  18. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    Science.gov (United States)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  19. Using exploratory regression to identify optimal driving factors for cellular automaton modeling of land use change.

    Science.gov (United States)

    Feng, Yongjiu; Tong, Xiaohua

    2017-09-22

    Defining transition rules is an important issue in cellular automaton (CA)-based land use modeling because these models incorporate highly correlated driving factors. Multicollinearity among correlated driving factors may produce negative effects that must be eliminated from the modeling. Using exploratory regression under pre-defined criteria, we identified all possible combinations of factors from the candidate factors affecting land use change. Three combinations that incorporate five driving factors meeting pre-defined criteria were assessed. With the selected combinations of factors, three logistic regression-based CA models were built to simulate dynamic land use change in Shanghai, China, from 2000 to 2015. For comparative purposes, a CA model with all candidate factors was also applied to simulate the land use change. Simulations using three CA models with multicollinearity eliminated performed better (with accuracy improvements about 3.6%) than the model incorporating all candidate factors. Our results showed that not all candidate factors are necessary for accurate CA modeling and the simulations were not sensitive to changes in statistically non-significant driving factors. We conclude that exploratory regression is an effective method to search for the optimal combinations of driving factors, leading to better land use change models that are devoid of multicollinearity. We suggest identification of dominant factors and elimination of multicollinearity before building land change models, making it possible to simulate more realistic outcomes.

  20. Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

    Science.gov (United States)

    Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

    2018-01-01

    In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.

  1. Accounting for spatial effects in land use regression for urban air pollution modeling.

    Science.gov (United States)

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    Science.gov (United States)

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have

  3. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    Science.gov (United States)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  4. Accounting for Zero Inflation of Mussel Parasite Counts Using Discrete Regression Models

    Directory of Open Access Journals (Sweden)

    Emel Çankaya

    2017-06-01

    Full Text Available In many ecological applications, the absences of species are inevitable due to either detection faults in samples or uninhabitable conditions for their existence, resulting in high number of zero counts or abundance. Usual practice for modelling such data is regression modelling of log(abundance+1 and it is well know that resulting model is inadequate for prediction purposes. New discrete models accounting for zero abundances, namely zero-inflated regression (ZIP and ZINB, Hurdle-Poisson (HP and Hurdle-Negative Binomial (HNB amongst others are widely preferred to the classical regression models. Due to the fact that mussels are one of the economically most important aquatic products of Turkey, the purpose of this study is therefore to examine the performances of these four models in determination of the significant biotic and abiotic factors on the occurrences of Nematopsis legeri parasite harming the existence of Mediterranean mussels (Mytilus galloprovincialis L.. The data collected from the three coastal regions of Sinop city in Turkey showed more than 50% of parasite counts on the average are zero-valued and model comparisons were based on information criterion. The results showed that the probability of the occurrence of this parasite is here best formulated by ZINB or HNB models and influential factors of models were found to be correspondent with ecological differences of the regions.

  5. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    Science.gov (United States)

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  6. Analysis of the influence of quantile regression model on mainland tourists' service satisfaction performance.

    Science.gov (United States)

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  7. Regression models in the determination of the absorbed dose with extrapolation chamber for ophthalmological applicators

    International Nuclear Information System (INIS)

    Alvarez R, J.T.; Morales P, R.

    1992-06-01

    The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, ( 90 Sr/ 90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)

  8. Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

    Directory of Open Access Journals (Sweden)

    Wen-Cheng Wang

    2014-01-01

    Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  9. Replica analysis of overfitting in regression models for time-to-event data

    Science.gov (United States)

    Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

    2017-09-01

    Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.

  10. New limb-darkening coefficients for modeling binary star light curves

    Science.gov (United States)

    Van Hamme, W.

    1993-01-01

    We present monochromatic, passband-specific, and bolometric limb-darkening coefficients for a linear as well as nonlinear logarithmic and square root limb-darkening laws. These coefficients, including the bolometric ones, are needed when modeling binary star light curves with the latest version of the Wilson-Devinney light curve progam. We base our calculations on the most recent ATLAS stellar atmosphere models for solar chemical composition stars with a wide range of effective temperatures and surface gravitites. We examine how well various limb-darkening approximations represent the variation of the emerging specific intensity across a stellar surface as computed according to the model. For binary star light curve modeling purposes, we propose the use of a logarithmic or a square root law. We design our tables in such a manner that the relative quality of either law with respect to another can be easily compared. Since the computation of bolometric limb-darkening coefficients first requires monochromatic coefficients, we also offer tables of these coefficients (at 1221 wavelength values between 9.09 nm and 160 micrometer) and tables of passband-specific coefficients for commonly used photometric filters.

  11. Predictive thermodynamic models for liquid--liquid extraction of single, binary and ternary lanthanides and actinides

    International Nuclear Information System (INIS)

    Hoh, Y.C.

    1977-03-01

    Chemically based thermodynamic models to predict the distribution coefficients and the separation factors for the liquid--liquid extraction of lanthanides-organophosphorus compounds were developed by assuming that the quotient of the activity coefficients of each species varies slightly with its concentrations, by using aqueous lanthanide or actinide complexes stoichiometric stability constants expressed as its degrees of formation, by making use of the extraction mechanism and the equilibrium constant for the extraction reaction. For a single component system, the thermodynamic model equations which predict the distribution coefficients, are dependent on the free organic concentration, the equilibrated ligand and hydrogen ion concentrations, the degree of formation, and on the extraction mechanism. For a binary component system, the thermodynamic model equation which predicts the separation factors is the same for all cases. This model equation is dependent on the degrees of formation of each species in their binary system and can be used in a ternary component system to predict the separation factors for the solutes relative to each other

  12. Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

    Directory of Open Access Journals (Sweden)

    Xibin Zhang

    2016-04-01

    Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.

  13. INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Yılmaz KAYA

    2012-06-01

    Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.

  14. Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

    Directory of Open Access Journals (Sweden)

    Anke Hüls

    2017-05-01

    Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate

  15. Bone marrow endothelial progenitors augment atherosclerotic plaque regression in a mouse model of plasma lipid lowering

    Science.gov (United States)

    Yao, Longbiao; Heuser-Baker, Janet; Herlea-Pana, Oana; Iida, Ryuji; Wang, Qilong; Zou, Ming-Hui; Barlic-Dicen, Jana

    2012-01-01

    The major event initiating atherosclerosis is hypercholesterolemia-induced disruption of vascular endothelium integrity. In settings of endothelial damage, endothelial progenitor cells (EPCs) are mobilized from bone marrow into circulation and home to sites of vascular injury where they aid endothelial regeneration. Given the beneficial effects of EPCs in vascular repair, we hypothesized that these cells play a pivotal role in atherosclerosis regression. We tested our hypothesis in the atherosclerosis-prone mouse model in which hypercholesterolemia, one of the main factors affecting EPC homeostasis, is reversible (Reversa mice). In these mice normalization of plasma lipids decreased atherosclerotic burden; however, plaque regression was incomplete. To explore whether endothelial progenitors contribute to atherosclerosis regression, bone marrow EPCs from a transgenic strain expressing green fluorescent protein under the control of endothelial cell-specific Tie2 promoter (Tie2-GFP+) were isolated. These cells were then adoptively transferred into atheroregressing Reversa recipients where they augmented plaque regression induced by reversal of hypercholesterolemia. Advanced plaque regression correlated with engraftment of Tie2-GFP+ EPCs into endothelium and resulted in an increase in atheroprotective nitric oxide and improved vascular relaxation. Similarly augmented plaque regression was also detected in regressing Reversa mice treated with the stem cell mobilizer AMD3100 which also mobilizes EPCs to peripheral blood. We conclude that correction of hypercholesterolemia in Reversa mice leads to partial plaque regression that can be augmented by AMD3100 treatment or by adoptive transfer of EPCs. This suggests that direct cell therapy or indirect progenitor cell mobilization therapy may be used in combination with statins to treat atherosclerosis. PMID:23081735

  16. Goal-oriented error estimation for Cahn-Hilliard models of binary phase transition

    KAUST Repository

    van der Zee, Kristoffer G.

    2010-10-27

    A posteriori estimates of errors in quantities of interest are developed for the nonlinear system of evolution equations embodied in the Cahn-Hilliard model of binary phase transition. These involve the analysis of wellposedness of dual backward-in-time problems and the calculation of residuals. Mixed finite element approximations are developed and used to deliver numerical solutions of representative problems in one- and two-dimensional domains. Estimated errors are shown to be quite accurate in these numerical examples. © 2010 Wiley Periodicals, Inc.

  17. The potential energy landscape in the Lennard-Jones binary mixture model

    International Nuclear Information System (INIS)

    Sampoli, M; Benassi, P; Eramo, R; Angelani, L; Ruocco, G

    2003-01-01

    The potential energy landscape in the Kob-Andersen Lennard-Jones binary mixture model has been studied carefully from the liquid down to the supercooled regime, from T = 2 down to 0.46. One thousand independent configurations along the time evolution locus have been examined at each temperature investigated. From the starting configuration, we searched for the nearest saddle (or quasi-saddle) and minimum of the potential energy. The vibrational densities of states for the starting and the two derived configurations have been evaluated. Besides the number of negative eigenvalues of the saddles other quantities show some signature of the approach of the dynamical arrest temperature

  18. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    Science.gov (United States)

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  19. Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

    Directory of Open Access Journals (Sweden)

    Fatimah Khaleel Ibrahim

    2017-08-01

    Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.

  20. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  1. Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry.

    Science.gov (United States)

    Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H

    2014-12-30

    For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons

  2. Significance tests to determine the direction of effects in linear regression models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

    2015-02-01

    Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.

  3. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Science.gov (United States)

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  4. The Reflection Effect on the Eclipsing Binary by the Wilson and Devinney's Model and Russell and Merrill's Model

    Directory of Open Access Journals (Sweden)

    Seong Hee Choea

    1992-06-01

    Full Text Available The reflection effect on three types of eclipsing binaries has been analyzed Wilson and Devinney's model and Russell and Merrill's model. The reflection effect was displayed on the theoretical light curves for the various conditions using the Wilson and Devinney's light curve program. Two models were compared after the rectifing the theoretical light curves including the reflection effect with the Russell and Merrill's method. The result shows that two models have an agreement on the reflection effect just in cases of the small difference in temperature and albedo between two stars in the system.

  5. Climate Impacts on Chinese Corn Yields: A Fractional Polynomial Regression Model

    NARCIS (Netherlands)

    Kooten, van G.C.; Sun, Baojing

    2012-01-01

    In this study, we examine the effect of climate on corn yields in northern China using data from ten districts in Inner Mongolia and two in Shaanxi province. A regression model with a flexible functional form is specified, with explanatory variables that include seasonal growing degree days,

  6. Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation

    DEFF Research Database (Denmark)

    Alan, Sule; Honore, Bo E.; Hu, Luojia

    2014-01-01

    This paper constructs estimators for panel data regression models with individual speci…fic heterogeneity and two–sided censoring and truncation. Following Powell (1986) the estimation strategy is based on moment conditions constructed from re–censored or re–truncated residuals. While these moment...

  7. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  8. A random regression model in analysis of litter size in pigs | Lukovi& ...

    African Journals Online (AJOL)

    Dispersion parameters for number of piglets born alive (NBA) were estimated using a random regression model (RRM). Two data sets of litter records from the Nemščak farm in Slovenia were used for analyses. The first dataset (DS1) included records from the first to the sixth parity. The second dataset (DS2) was extended ...

  9. Modeling geochemical datasets for source apportionment: Comparison of least square regression and inversion approaches.

    Digital Repository Service at National Institute of Oceanography (India)

    Tripathy, G.R.; Das, Anirban.

    used methods, the Least Square Regression (LSR) and Inverse Modeling (IM), to determine the contributions of (i) solutes from different sources to global river water, and (ii) various rocks to a glacial till. The purpose of this exercise is to compare...

  10. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    Science.gov (United States)

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  11. Clinical trials: odds ratios and multiple regression models--why and how to assess them

    NARCIS (Netherlands)

    Sobh, Mohamad; Cleophas, Ton J.; Hadj-Chaib, Amel; Zwinderman, Aeilko H.

    2008-01-01

    Odds ratios (ORs), unlike chi2 tests, provide direct insight into the strength of the relationship between treatment modalities and treatment effects. Multiple regression models can reduce the data spread due to certain patient characteristics and thus improve the precision of the treatment

  12. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Science.gov (United States)

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  13. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    Science.gov (United States)

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  14. Reduction of the number of parameters needed for a polynomial random regression test-day model

    NARCIS (Netherlands)

    Pool, M.H.; Meuwissen, T.H.E.

    2000-01-01

    Legendre polynomials were used to describe the (co)variance matrix within a random regression test day model. The goodness of fit depended on the polynomial order of fit, i.e., number of parameters to be estimated per animal but is limited by computing capacity. Two aspects: incomplete lactation

  15. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  16. Bayesian linear regression : different conjugate models and their (in)sensitivity to prior-data conflict

    NARCIS (Netherlands)

    Walter, G.M.; Augustin, Th.; Kneib, Thomas; Tutz, Gerhard

    2010-01-01

    The paper is concerned with Bayesian analysis under prior-data conflict, i.e. the situation when observed data are rather unexpected under the prior (and the sample size is not large enough to eliminate the influence of the prior). Two approaches for Bayesian linear regression modeling based on

  17. Semi-parametric estimation of random effects in a logistic regression model using conditional inference

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2016-01-01

    This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...

  18. The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

    Science.gov (United States)

    Fanning, Fred; Newman, Isadore

    Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

  19. Pivotal statistics for testing subsets of structural parameters in the IV Regression Model

    NARCIS (Netherlands)

    Kleibergen, F.R.

    2000-01-01

    We construct a novel statistic to test hypothezes on subsets of the structural parameters in anInstrumental Variables (IV) regression model. We derive the chi squared limiting distribution of thestatistic and show that it has a degrees of freedom parameter that is equal to the number ofstructural

  20. The prediction of intelligence in preschool children using alternative models to regression.

    Science.gov (United States)

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.